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Abstract — This paper establishes information-theoretic limits 
for estimating a finite field low-rank matrix given random linear 
measurements of it. These linear measurements are obtained 
by taking inner products of the low-rank matrix with random 
sensing matrices. Necessary and sufficient conditions on the 
number of measurements required are provided. It is shown 
that these conditions are sharp and the minimum-rank decoder 
is asymptotically optimal. The reliability function of this decoder 
is also derived by appealing to de Caen's lower bound on the 
probability of a union. The sufficient condition also holds when 
the sensing matrices are sparse - a scenario that may be amenable 
to efficient decoding. More precisely, it is shown that if the 
n x n-sensing matrices contain, on average, f2(nlog n) entries, the 
number of measurements required is the same as that when the 
sensing matrices are dense and contain entries drawn uniformly 
at random from the field. Analogies are drawn between the above 
results and rank-metric codes in the coding theory literature. 
In fact, we are also strongly motivated by understanding when 
minimum rank distance decoding of random rank-metric codes 
succeeds. To this end, we derive minimum distance properties 
of equiprobable and sparse rank-metric codes. These distance 
properties provide a precise geometric interpretation of the fact 
that the sparse ensemble requires as few measurements as the 
dense one. 

Index Terms — Rank minimization, Finite fields, Reliability 
function, Sparse parity-check matrices, Rank-metric codes, Min- 
imum rank distance properties 



I. Introduction 

This paper considers the problem of rank minimization over 
finite fields. Our work attempts to connect two seemingly dis- 
parate areas of study that have, by themselves, become popular 
in the information theory community in recent years: (i) the 
theory of matrix completion [2|-[4| and rank minimization [5|, 
||6) over the reals and (ii) rank-metric codes ll7l- lfi"2l . which 
are the rank distance analogs of binary block codes endowed 
with the Hamming metric. The work herein provides a starting 
point for investigating the potential impact of the low-rank 
assumption on information and coding theory. We provide a 
brief review of these two areas of study. 
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The problem of matrix completion ||2]— HI can be stated as 
follows: One is given a subset of noiseless or noisy entries 
of a low-rank matrix (with entries over the reals), and is then 
required to estimate all the remaining entries. This problem 
has a variety of applications from collaborative filtering (e.g., 
Netfiix prize [ 1 3 1 ) to obtaining the minimal realization of a 
linear dynamical system [14]. Algorithms based on the nuclear 
norm (sum of singular values) convex relaxation of the rank 
function [14|, [15] have enjoyed tremendous successes. A 
generalization of the matrix completion problem is the rank 
minimization problem 0, jfQ where, instead of being given 
entries of the low-rank matrix, one is given arbitrary linear 
measurements of it. These linear measurements are obtained 
by taking inner products of the unknown matrix with sensing 
matrices. The nuclear norm heuristic has also been shown 
to be extremely effective in estimating the unknown low- 
rank matrix. Theoretical results [0, |]6] are typically of the 
following flavour: If the number of measurements (also known 
as the measurement complexity) exceeds a small multiple 
of the product of the dimension of the matrix and its rank, 
then optimizing the nuclear-norm heuristic yields the same 
(optimal) solution as the rank minimization problem under 
certain conditions on the sensing matrices. Note that in the 
case of real matrices, if the observations (or the entries) are 
noisy, perfect reconstruction is impossible. As we shall see in 
Section [V] this is not the case in the finite field setting. We 
can recover the underlying matrix exactly albeit at the cost of 
a higher measurement complexity. 

Rank-metric codes |[7l- lfT2l are subsets of finite field ma- 
trices endowed with the rank-metric. We will be concerned 
with linear rank-metric codes, which may be characterized by 
a family of parity-check matrices, which are equivalent to the 
sensing matrices in the rank minimization problem. 

A. Motivations 

Besides analyzing the measurement complexity for rank 
minimization over finite fields, this paper is also motivated 
by two applications in coding. The first is index coding with 
side information fl6l . In brief, a sender wants to communicate 
the Z-th coordinate of a length-!/ bit string to the l-th of L 
receivers. Furthermore, each of the L receivers knows a subset 
of the coordinates of the bit string. These subsets can be 
represented by (the neighbourhoods of) a graph. Bar-Yossef 
et al. [16 1 showed that the linear version of this problem 
reduces to a rank minimization problem. In previous works, 
the graph is deterministic. Our work, and in particular the 
rank minimization problem considered herein, can be cast as 
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the solution of a linear index coding problem with a random 
side information graph. 

Second, we are interested in properties of the rank-metric 
coding problem iflOl . Here, we are given a set of matrix-valued 
codewords that form a linear rank-metric code c € . A codeword 
C* £ ^ is transmitted across a noisy finite field matrix-valued 
channel which induces an additive error matrix X. This error 
matrix X is assumed to be low rank. For example, X could 
be a matrix induced by the crisscross error model in data 
arrays lfl7l . In the crisscross error model, X is a sparse low 
rank matrix in which the non-zero elements are restricted to 
a small number of rows and columns. The received matrix is 
R := C* + X. The minimum distance decoding problem is 
given by the following: 

C := argmin rank(R — C). (1) 

We would like to study when problem ([TJ succeeds (i.e., 
uniquely recovers the true codeword C*) with high proba- 
bility (w.h.p.) given that "if is a random code characterized 
by either dense or sparse random parity-check matrices and 
X is a deterministic error matrix. But why analyze random 
codes? Our study of random (instead of deterministic) codes is 
motivated by the fact that data arrays that arise in applications 
are often corrupted by crisscross error patterns iflTl . Decoding 
techniques used in the rank-metric literature such as error 
trapping [11], [18| are unfortunately not able to correct such 
error patterns because they are highly structured and hence 
the "error traps" would miss (or not be able to correct) a 
non-trivial subset of errors. Indeed, the success such an error 
trapping strategy hinges strongly on the assumption that the 
underlying low-rank error matrix X is drawn uniformly at 
random over all matrices whose rank is r lfl8l Sec. IV] (so 
subspaces can be trapped). The decoding technique in ifTTl 
is specific to correcting crisscross error patterns. In contrast, 
in this work, we are able to derive distance properties of 
random rank-metric codes and to show that given sufficiently 
many constraints on the codewords, all error patterns of rank 
no greater than r can be successfully corrected. Although 
our derivations are similar in spirit to those in Barg and 
Forney lfl9l . our starting point is rather different. In particular, 
we combine the use of techniques from [20] and those in [ 19|. 

We are also motivated by the fact that error exponent- 
like results for matrix-valued finite field channels are, to the 
best of the authors' knowledge, not available in the literature. 
Such channels have been popularized by the seminal work 
in ETI . Capacity results for specific channel models such as 
the uniform given rank (u.g.r.) multiplicative noise model l22l 
have recently been derived. In this work, we derive the error 
exponent for the minimum-rank decoder E(R) (for the addi- 
tive noise model). This fills an important gap in the literature. 

B. Main Contributions 

We summarize our four main contributions in this work. 
Firstly, by using a standard converse technique (Fano's 
inequality), we derive a necessary condition on the number 

'Here and in the following, with high probability means with probability 
tending to one as the problem size tends to infinity. 



of measurements required for estimating a low-rank matrix. 
Furthermore, under the assumption that the linear measure- 
ments are obtained by taking inner products of the unknown 
matrix with sensing matrices containing independent entries 
that are equiprobable (in F q ), we demonstrate an achievability 
procedure, called the min-rank decoder, that matches the 
information-theoretic lower bound on the number of mea- 
surements required. Hence, the sufficient condition is sharp. 
Extensions to the noisy case are also discussed. Note that in 
this paper, we are not as concerned with the computational 
complexity of recovering the unknown low-rank matrix as 
compared to the fundamental limits of doing so. 

Secondly, we derive the reliability function (error exponent) 
E(R) of the min-rank decoder by using de Caen's lower bound 
on the probability of a union [23]. The use of de Caen's bound 
to obtain estimates of the reliability function (or probability 
of error) is not new. See the works by Seguin 11241 and Cohen 
and Merhav 11251 for example. However, by exploiting pairwise 
independence of constituent error events, we not only derive 
upper and lower bounds on E(R), we show that these bounds 
are, in fact, tight for all rates (for the min-rank decoder). 
We derive the corresponding error exponents for codes in [[7] 
and [18] and make comparisons between the error exponents. 

Thirdly, we show that if the fraction of non-zero entries 
of the sensing or measurement matrices scales (on average) 
as f2(^^) (where the matrix is of size n x n), the min-rank 
decoder achieves the information-theoretic lower bound. Thus, 
if the average number of entries in each sparse sensing matrix 
is fl(n log n) (which is much fewer than n 2 ), we can show that, 
very surprisingly, the number of linear measurements required 
for reliable reconstruction of the unknown low-rank matrix is 
exactly the same as that for the equiprobable (dense) case. This 
main result of ours opens the possibility for the development 
of efficient, message-passing decoding algorithms based on 
sparse parity-check matrices [26]. 

Finally, we draw analogies between the above results and 
rank-metric codes Q-JT2] in the coding theory literature. We 
derive minimum (rank) distance properties of the equiprobable 
random ensemble and the sparse random ensemble. Using 
elementary techniques, we derive an analog of the Gilbert- 
Varshamov distance for the random rank-metric code. We also 
compare and contrast our result to classical binary linear block 
codes with the Hamming metric [19|. From our analyses in 
this section, we obtain geometric intuitions to explain why 
minimum rank decoding performs well even when the sensing 
matrices are sparse. We also use these geometric intuitions to 
guide our derivation of strong recovery guarantees along the 
lines of the recent work by Eldar et al. 11271 . 

C. Related Work 

There is a wealth of literature on rank minimization to which 
we will not be able to do justice here. See for example the 
seminal works by Fazel et al. 0141 . Ifl5l and the subsequent 
works by other authors |]2]-[|4] (and the references therein). 
However, all these works focus on the case where the unknown 
matrix is over the reals. We are interested in the finite field 
setting because such a problem has many connections with 



IEEE TRANSACTIONS ON INFORMATION THEORY 



TABLE I 

Comparison of our work (Tan-Balzano-Draper) to existing 
coding-theoretic techniques for rank minimization 



Paper 


Code Structure 


Decoding Technique 


Gabidulin |7| 


Algebraic 


Berlekamp-Massey 


SKK [U)| 


Algebraic 


Extended Berlekamp-Massey 


MU LLU 


Factor Graph 


Error Trapping & Message Passing 


SKK [181 


Error Trapping 


Error Trapping 


GLS |33 ] 


Perfect Graph 


Semidefinite Program (Ellipsoid) 


TBD 


See Table |II| 


Min-Rank Decoder (Section IVIIIl 



and applications to coding and information theory lTT6l . ifTTIl . 
|28|. The analogous problem for the reals was considered by 
Eldar et al. Il27l . The results in ll27l . developed for dense 
sensing matrices with i.i.d. Gaussian entries, mirror those in 
this paper but only achievability results (sufficient conditions) 
are provided. We additionally analyze the sparse setting. 

Our work is partially inspired by [29] where fundamental 
limits for compressed sensing over finite fields were derived. 
To the best of our knowledge, Vishwanath's work ll30l is 
the only one that employs information-theoretic techniques to 
derive necessary and sufficient conditions on the number of 
measurements required for reliable matrix completion (or rank 
minimization). It was shown using typicality arguments that 
the number of measurements required is within a logarithmic 
factor of the lower bound. Our setting is different because we 
assume that we have linear measurements instead of randomly 
sampled entries. We are able to show that the achievability 
and converse match for a family of random sensing matrices. 
Emad and Milenkovic iLTfl recently extended the analyses in 
the conference version [1] of this paper to the tensor case, 
where the rank, the order of the tensor and the number of 
measurements grow simultaneously with the size of the matrix. 
We compare and contrast our decoder and analysis for the 
noisy case to that in OTI . Another recent related work is that 
by Kakhaki et al. Il32l where the authors considered the binary 
erasure channel (BEC) and binary symmetric channel (BSC) 
and empirically studied the error exponents for codes whose 
generator matrices are random and sparse. For the BEC, the 
authors showed that there exist capacity-achieving codes with 
generator matrices whose sparsity factor (density) is O(^p-) 
(similar to this work). However, motivated by the fact that 
sparse parity-check matrices may make decoding amenable to 
lower complexity message-passing type decoders, we analyze 
the scenario where the parity-check matrices are sparse. 

The family of codes known as rank-metric codes ll7l- lfT2l . 
which are the the rank-distance analog of binary block codes 
equipped with the Hamming metric, bears a striking similarity 
to the rank minimization problem over finite fields. Compar- 
isons between this work and related works in the coding theory 
literature are summarized in Table H] Our contributions in the 
various sections of this paper, and other pertinent references, 
are summarized in Table UT1 We will further elaborate on these 
comparisons in Section HX-AI 

D. Outline of Paper 

Section [TT] details our notational choices, describes the 
measurement models and states the problem. In Section [HI] we 
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TABLE II 

Comparisons between the results in various sections of this 
paper and other related works 



Parity-check 
matrix H a 


Random 
low-rank matrix X 


Deterministic 
low-rank matrix X 


Random, dense 


Section IIVI 


Section IIVI 


Deterministic, dense 


Section |IV| |18| 


Section IVII-CI |7|, 1 10 1 


Random, sparse 


Section |VI| 


Section |VI| 


Deterministic, sparse 


Section IVII |11|, |18| 


Section IVII-CI 



use Fano's inequality to derive a lower bound on the number 
of measurements for reconstructing the unknown low-rank 
matrix. In Section |IV] we consider the uniformly at random 
(or equiprobable) model where the entries of the measurement 
matrices are selected independently and uniformly at random 
from F„. We derive a sufficient condition for reliable recovery 
and the reliability function of the min-rank decoder using de 
Caen's lower bound. The results are then extended to the noisy 
scenario in Section [V] Section [VI] which contains our main 
result, considers the case where the measurement matrices are 
sparse. We derive a sufficient condition on the sparsity factor 
(density) as well as the number of measurements for reliable 
recovery. Section IVII I is devoted to understanding and inter- 
preting the above results from a coding-theoretic perspective. 
In Section IVIII1 we provide a procedure to search for the 
low-rank matrix by exploiting indeterminacies in the problem. 
Discussions and conclusions are provided in Section |IX] The 
lengthier proofs are deferred to the appendices. 

II. Problem Setup and Model 

In this section, we state our notational conventions, describe 
the system model and state the problem. We also distinguish 
between the two related notions of weak and strong recovery. 

A. Notation 

In this paper we adopt the following set of notations: 
Serif font and san-serif font denote deterministic and random 
quantities respectively. Bold-face upper-case and bold-face 
lower-case denote matrices and (column) vectors respectively. 
Thus, y, y, X and X denote a deterministic scalar, a scalar- 
valued random variable, a deterministic matrix and a random 
matrix respectively. Random functions will also be denoted in 
san-serif font. Sets (and events) are denoted with calligraphic 
font (e.g., U or The cardinality of a finite set U is 
denoted as \U\. For a prime power q, we denote the finite 
(Galois) field with q elements as ¥ q . If q is prime, one can 
identify ¥ q with Z q = {0, . . . , q — 1}, the set of the integers 
modulo q. The set of to x n matrices with entries in F g is 
denoted as F™ xn . For simplicity, we let [k] :— {1, ...,&} 
and y k := (yi, . . . , yfc). For a matrix M, the notations ||M||o 
and rank(M) respectively denote the number of non-zero 
elements in M (the Hamming weight) and the rank of M 
in Fg. For a matrix M € F™ x ", we also use the notation 
vec(M) £ F™" to denote vectorization of M with its columns 
stacked on top of one another. For a real number b, the 
notation |6| + is defined as max{fr, 0}. Asymptotic notation 
such as 0( ■ ), f2( • ) and o( • ) will be used throughout. See [34 
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TABLE III 
Table of symbols used in this paper 



Notation 


Definition 


Section 


k 


Number of measurements 


Section III-BI 


r/n — > 7 


Rank-dimension ratio 


Section III-BI 


CT = ll w llo/^ 


Deterministic noise parameter 


Section |V-A| 


a = k/n'- 1 


Measurement scaling parameter 


Section |V-B| 


p = E|[w||o/fc 


Random noise parameter 


Section IV-BI 


5=K\\H a \\ /n* 


Sparsity factor 


Section |VI| 


N*(r) 


Num. of matrices of rank r in c <f 


Section IVIII 


d(tf) 


Minimum rank distance of V 


Section IVIII 



Sec. 1.3] for definitions. For the reader's convenience, we have 
summarized the symbols used in this paper in Table [HI] 

B. System Model 

We are interested in the following model: Let X be an 
unknown (deterministic or random) square^ matrix in P q ixn 
whose rank is less than or equal to r, i.e., rank(X) < r. The 
upper bound on the rank r is allowed to be a function of n, 
i.e., r — r n . We assume that r/n — > 7 and we say that the 
limit 7 G [0, 1] is the rank- dimension raf;o0 We would like to 
recover or estimate X from k linear measurements 

y a = (H a ,X>:= Y, [ H «kj[ X ki ae M' ( 2 ) 

(i,j)e["] 2 

i.e., y a is the trace of H a X T . In ©, the sensing or mea- 
surement matrices H a G F" x ",a G [k], are random matrices 
chosen according to some probability mass function (pmf). 
The k scalar measurements y a £ ¥ q ,a £ [k], are available 
for estimating X. We will operate in the so-called high- 
dimensional setting and allow the number of measurements 
k to depend on n, i.e., k = k n . Multiplication and addition 
in © are performed in ¥ q . In the subsequent sections, we will 
also be interested in a generalization of the model in (fJJ where 
the measurements y a ,a £ [k], may not be noiseless, i.e., 

y a = (H„,X) +w a , a£[k], (3) 

where w a ,a £ [k], represents random or deterministic noise. 
We will specify precise noise models in Section [V] 

The measurement models we are concerned with in this 
paper, (fJJ and (f5), are somewhat different from the matrix 
completion problem ||2]-||4). In the matrix completion setup, 
a subset of entries ft C [n] 2 in the matrix X is observed and 
one would like to "fill in" the rest of the entries assuming 
the matrix is low-rank. This model can be captured by (ff) by 
choosing each sensing matrix H a to be non-zero only in a 
single position. Assuming H a 7^ H a / for all a 7^ a', the num- 
ber of measurements is k = In contrast, our measurement 
models in (O and © do not assume that ||H a ||o = 1. The 
sensing matrices are, in general, dense although in Section |VI| 

2 Our results are not restricted to the case where X is square but for the 
most part in this paper, we assume that X is square for ease of exposition. 

3 Our results also include the regime where r = o(n) but the case where 
r = 0(n) (and 7 is the proportionality constant) is of greater interest and 
significance. This is because the rank r grows as rapidly as possible and hence 
this regime is the most challenging. Note that if r/n — > 7 = 1, then we would 
need n 2 measurements to recover X since we are not making any low rank 
assumptions on it. This is corroborated by the converse in Proposition [2J 



we also analyze the scenario where H a is relatively sparse. 
Our setting is more similar in spirit to the rank minimization 
problems analyzed in Recht et al. |5), Meka et al. (6) and 
Eldar et al. Il27l . However, these works focus on problems in 
the reals whereas our focus is the finite field setting. 

C. Problem Statement 

Our objective is to estimate the unknown low-rank matrix 
X given y k (and the measurement matrices H a ,a £ [k]). In 
general, given the measurement model in (f2) and without 
any assumptions on X, the problem is ill-posed and it is 
not possible to recover X if k < n 2 . However, because X 
is assumed to have rank no larger than r (and r/n — > 7), 
we can exploit this additional information to estimate X 
with k < n 2 measurements. Our goal in this paper is to 
characterize necessary and sufficient conditions on the number 
of measurements k as n becomes large assuming a particular 
pmf governing the sensing matrices H a ,a £ [k] and under 
various (random and deterministic) models on X. 

D. Weak Versus Strong Recovery 

In this paper, we will focus (in Sections [Hi] to IVU on the 
so-called weak recovery problem where the unknown low-rank 
matrix X is fixed and we ask how many measurements k are 
sufficient to recover X (and what the procedure is for doing 
so). However, there is also a companion problem known as 
the strong recovery problem, where one would like to recover 
all matrices in F™ x ™ with rank no larger than r. A familiar 
version of this distinction also arises in compressed sensingQ 

More precisely, given k sensing matrices H a ,a £ [k], we 
define the linear operator H : F^ x ™ —> F* as 

H(X) := [(H 1 ,X),(H 2 ,X),...,(H fc ,X)] T . (4) 

Then, a necessary and sufficient condition for strong recovery 
is that the operator H is injective when restricted to the set 
of all matrices of rank-2r (or less). In other words, there are 
no rank-2r (or less) matrices in the nullspace of the operator 
H ||27l Sec. 2]. This can be observed by noting that for two 
matrices Xi and X2 of rank-r (or less) that generate the same 
linear observations (i.e., H(Xi) = H(X2)), their difference 
Xi — X2 has rank at most 2r by the triangle inequality |f| We 
would thus like to find conditions on k (via, for example, the 
geometry of the random code) such that the following subset 
ofFJ w 

n { ^> := {X G F" xn : rank(X) < 2r} (5) 

is disjoint from the nullspace of H with probability tending to 
one as n grows. As mentioned in Section Hl-BI we allow r to 

4 Analogously in compressed sensing, consider the combinatorial ^o-norm 
optimization problem miii££j"t»{||x||rj : Ax = y}, where the field F can 
either be the reals K (57] or a finite field ¥ q l29l. It can be seen that if 
we want to recover fixed but unknown s-sparse vector x (weak recovery), 
s + 1 linear measurements suffice w.h.p. However, for strong recovery where 
we would like to guarantee recovery for all s-sparse vectors, we need to 
ensure that the nullspace of the measurement matrix A is disjoint from the 
set of 2s-sparse vectors. Thus, w.h.p., 2s measurements are required for strong 
recovery 1271. l29l. 

5 Note that (A, B) t-> rank(A — B) is a metric on the space of matrices. 
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grow linearly with n (with proportionality constant 7). Under 
the condition that Tc^. PI nullspace(H) = 0, the solution to the 
rank minimization problem [stated precisely in (TlZt below] is 
unique and correct for all low-rank matrices with probability 
tending to one as n grows. As we shall see in Section [VII-CI 
the conditions on k for strong recovery are more stringent 
than those for weak recovery. See the recent paper by Eldar et 
al. Il27l Sec. 2] for further discussions on weak versus strong 
recovery in the real field setting. 

E. Bounds on the number of low-rank matrices 

In the sequel, we will find it useful to leverage the following 
lemma, which is a combination of results stated in lF2Tl 
Lemma 4], [9, Proposition 1] and [12, Lemma 5]. 

Lemma 1 (Bounds on the number of low-rank matrices). Let 

<& q (n, r) and "J q (n, r) respectively be the number of matrices 
in F™ xn of rank exactly r and the number of matrices in 
F™ x " of rank less than or equal to r. Note that ^ q (n,r) — 
^[ =0 $ g (n, Z). The following bounds hold: 



g (2„-2)r-r 2 < r) < 4q 2 "" 2 , 

J2nr—r 2 



< Vq(n,r) < Aq 



Inr—r 



(6) 
(7) 



In other words, we have from (0 and the fact that r/n —> 7 
that I ^ log, *,(n,r)- 2 7 (1 -7/2) |-^0. 

III. A Necessary Condition for Recovery 

This section presents a necessary condition on the scaling of 
k with n for the matrix X to be recovered reliably, i.e., for the 
error probability in estimating X to tend to zero as n grows. As 
with most other converse statements in information theory, it is 
necessary to assume a statistical model on the unknown object, 
in this case X. Hence, in this section, we denote the unknown 
low-rank matrix as X (a random variable). We also assume 
that X is drawn uniformly at random from the set of matrices 
in F™ xn of rank less than or equal to r. For an estimator 
(deterministic or random function) X :¥ k x (F" xn ) fe — s- F™ x ™ 
whose range is the set of all F™ x "-matrices whose rank is less 
than or equal to r, we define the error event: 



£„:={X(y fc ,H fc )^X}. 



(8) 



This is the event that the estimate X(y fe , H fc ) is not equal to the 
true low-rank matrix X. We emphasize that the estimator can 
either be deterministic or random. In addition, the arguments 
(y fc , H fc ) are random so X(y fc , H fe ) in the definition of £ n is a 
random matrix. We can demonstrate the following: 

Proposition 2 (Converse). Fix e > and assume that X is 
drawn uniformly at random from all matrices of rank less than 
or equal to r. Also, assume X is independent of H fc . If 

k < (2-e)7(l- 7 /2)n 2 (9) 

then for any estimator X whose range is the set of F™ xn - 
matrices whose rank is less than or equal to r, P(£„) > e/4 > 
for all n sufficiently large. 

Proposition |2] states that the number of measurements k 
must exceed Inr — r 2 (which is approximately 27(1 — j/2)n 2 ) 



for recovery of X to be reliable, i.e., for the probability of £ n 
to tend to zero as n grows. From a linear algebraic perspective, 
this means we need at least as many measurements as there 
are degrees of freedom in the unknown object X. Clearly, the 
bound in (0 applies to both the noisy and the noiseless models 
introduced in Section IH-BI The proof involves an elementary 
application of Fano's inequality ll35l Sec. 2.10]. 

Proof: Consider the following lower bounds on the prob- 
ability of error P(£„): 

P(X jt X) > ^ (X|yfe) Hfe )- 1 - H(X)-I(X;y k , H fc )-1 



log, y q (n,r) 
(6) H(X)-I(X;y k \H k )-l 



log, *g(n,r) 
H{X) -H(y k \H k ) - 1 



log, * 9 (n,r) 

' - k - 1 (d) 



log ^ q (n,r) 



1 



log V q (n,r) 



log, * g (n,r) 
o(l), 



(10) 



where (a) is by Fano's inequality (estimating X given y k and 
H fc ), (b) is because H fc is independent of X so /(X; y fe , H fe ) = 
I(X;y k \H k ) +/(X;H fe ) = /(X;y fc |H fc ). Inequality (c) is due 
to the fact that y a is q-ary for all a £ [k] so 



H(y k \H k ) < H(y k ) < kH( yi ) < klog q q = k, 



(11) 



and finally, (d) is due to the uniformity of X. It can be 
easily verified that if k satisfies (0 for some e > 0, then 
fc/log, ty q (n,r) < 1 — e/3 for n sufficiently large by the lower 
bound in (0 and the convergence r/n —> 7. Hence, OH is 
larger than e/4 for all n sufficiently large. ■ 
We emphasize that the assumption that the sensing matrices 
H„,o€ [k] are statistically independent of the unknown low- 
rank matrix X is important. This is to ensure the validity of 
equality (b) in (fTOb . This assumption is not a restrictive one in 
practice since the sensing mechanism is usually independent 
of the unknown matrix. 

IV. Uniformly Random Sensing Matrices: The 
Noiseless Case 

In this section, we assume the noiseless linear model in 
and provide sufficient conditions for the recovery of a fixed X 
(a deterministic low-rank matrix) given y k , where rank(X) < 
r. We will also provide the functional form of the reliability 
function (error exponent) for this recovery problem. To do so 
we first consider the following optimization problem: 



minimize rank(X) 
subject to (H Q ,X) = y a , 



a £ [k] 



(12) 



The optimization variable is X <= F™ x ™. Thus among all the 
matrices that satisfy the linear constraints in (f2), we select one 
whose rank is the smallest. We call the optimization problem 
in ([T21 the min-rank decoder, denoting the set of minimizers 
as S C F^ x ™. If S is a singleton set, we also denote the unique 
optimizer to (fl~2l i. a random quantity, as X*. We analyze the 
error probability that either S is not a singleton set or X* does 
not equal the true matrix X, i.e., the error event 

£ n := {\S\ > 1} U {{\S\ =l}n{X*/ X}). (13) 
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The optimization in ( TTZb is, in general, intractable (in fact 
NP-hard) unless there is additional structure on the sensing 
matrices H a (See discussions in SectionllXli. Our focus, in this 
paper, is on the information-theoretic limits for solving (flZb 
and its variants. We remark that the minimization problem 
is reminiscent of Csiszar's so-called a-decoder for linear 
codes ||36ll . In ||36ll , Csiszar analyzed the error exponent of 
the decoder that minimizes a function a( ■ ) [e.g., the entropy 
H{ ■ )] of the type (or empirical distribution) of a sequence 
subject to the sequence satisfying a set of linear constraints. 

For this section and Section [V] we assume that each element 
in each sensing matrix is drawn independently and uniformly 
at random from ¥ q , i.e., from the pmf 

P h (h;q) = l/q, Vhe¥ q . (14) 

We call this the uniform or equiprobable measurement model. 
For simplicity, throughout this section, we use the notation P to 
denote the probability measure associated to the equiprobable 
measurement model. 

A. A Sufficient Condition for Recovery in the Noiseless Case 

In this subsection, we assume the noiseless linear model 
in ©. We can now exploit ideas from (29l to demonstrate 
the following achievability (weak recovery) result. Recall that 
X is non-random and fixed, and we are asking how many 
measurements yi , . • . , are sufficient for recovering X. 

Proposition 3 (Achievability). Fix e > 0. Under the uniform 
measurement model as in d!41 >, if 

k > (2 + e) 7 (l -j/2)n 2 (15) 

then P(£n) — > as n — > oo. 

Note that the number of measurements stipulated by Propo- 
sition |3] matches the information-theoretic lower bound in ©. 
In this sense, the min-rank decoder prescribed by the optimiza- 
tion problem in (fT2l is asymptotically optimal, i.e., the bounds 
are sharp. Note also that in the converse (Proposition |2), the 
range of the decoder X( • ) is constrained to be the set of 
matrices whose rank does not exceed r. Hence, the decoder 
in the converse has additional side information - namely the 
upper bound on the rank. For the min-rank decoder in (TT2l . no 
such knowledge of the rank is required and yet it meets the 
lower bound. We remark that the packing-like achievability 
proof is much simpler than the typicality-based argument 
presented by Vishwanath in [30] (albeit in a different setting). 

Proof: To each matrix Z £ ¥ q xn that is not equal to X 
and whose rank is no greater than rank(X), define the event 

Az :={(Z,H„) = (X,H Q ),Va£ [k]}. (16) 

Then we note that 

P(£„)=P |J Az\ (17) 

\Z:Z^X,rank(Z)<rank(X) J 

since an error occurs if and only if there exists a matrix Z ^ X 
such that (i) Z satisfies the linear constraints, (ii) its rank is 



less than or equal to the rank of X. Furthermore, we claim 
that F(Az) = <T fc for every Z ^ X. This follows because 

P(Az) =P«Z-X,H ) =0,ae [k]) 

= P((Z -X,Hi) = 0) fc = q- k , (18) 

where (a) follows from the fact that the H a are i.i.d. matrices 
and (b) from the fact Z X ^ and every non-zero 
element in a finite field has a (unique) multiplicative inverse 
so P((Z - X, Hi) = 0) = q- 1 (29), E§). More precisely, this 
is because (Z — X, Hi) has distribution by independence 
and uniformity of the elements in Hi. Since r/n —> 7, for any 
fixed 77' > 0, I r/n — 7 1 < 77' for all n sufficiently large. By the 
uniform continuity of the function t H> 2t — t 2 on t £ [0, 1], for 
any 77 > 0, \(2nr-r 2 )/n 2 - 2 7 (1 -7/2)! < 77 for all n > N n 
(an integer just depending on 77). Now by combining ( fT8l with 
the union of events bound, 

P(£n)< q- k <^ q (n,r)q- k 

Z:Z/X,rank(Z)<rank(X) 
( < 4 ? 2ro--r 2 -fc < 4g -r l 2 [-2 7 (l- 7 /2)- I? +fc/n 2 ]^ ^ 

where (c) follows because rank(X) < r, (d) follows from 
the upper bound in (Q and (e) follows for all n sufficiently 
large as argued above. Thus, we see that if k satisfies (fTBI l. the 
exponent in JT9l is positive if we choose 77' sufficiently small 
so that 77 < e 7 (l — 7/2). Hence P(£ n ) —> as desired. ■ 
Remark: Here and in the following, we can, without loss 
of generality, assume that r = [-fn\ (in place of r/n —> 7). 
In this way, we can remove the effect of the small positive 
constant 77 as in the above argument. This simplification does 
not affect the precision of any of the arguments in the sequel. 

B. The Reliability Function 

We have shown in the previous section that the min-rank 
decoder is asymptotically optimal in the sense that the number 
of measurements required for it to decode X reliably with 
P(£ n ) — > matches the lower bound (necessary condition) on 
k (Proposition |2)- It is also interesting to analyze the rate of 
decay of F(£ n ) for the min-rank decoder. For this purpose, 
we define the rate R of the measurement model. 

Definition 1. The rate of (a sequence of) linear measurement 
models as in (fj) is defined as 

R := lim - ~ k = lim 1 - (20) 

assuming the limit exists. Note that R £ [0, 1]. 

The use of the term rate is in direct analogy to the use of 
the term in coding theory. The rate of the linear code 

V := {C £ F; ix " : (C, H„) = 0, a £ [k]} (21) 

is R n := 1 — dim(span{vec(Hi), . . . , vec(Hfc)})/7i 2 , which 
is lower boundecf] by 1 — k/n 2 for every k = 0, 1, . . . , n 2 . 

6 The lower bound is achieved when the vectors vec(Hi), . . . , vec(Hj.) are 
linearly independent in ¥ q . See Section I VIII and in particular Proposition 1141 
for details when the sensing matrices are random. 
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We revisit the connection of the rank minimization problem 
to coding theory (and in particular to rank-metric codes) in 
detail in Section 17111 



Definition 2. If the limit exists, the reliability function or error 
exponent of the min-rank decoder ( 1121 l is defined as 



E{R) := lim -- ^ log,P(£ n ). 

n—too ti 



(22) 



We show in Corollary [7] that the limit in (1221 indeed exists. 
Unlike the usual definition of the reliability function [37 
Eq. (5.8.8)], the normalization in d22l is 1/n 2 since X is an 
n x n matrixQ Also, we restrict our attention to the min-rank 
decoder. The following proposition provides an upper bound 
on the reliability function of the min-rank decoder when there 
is no noise in the measurements as in @. 

Proposition 4 (Upper bound on E(R)). Assume that 
rank(X) /n — > 7 as n — > 00. Under the uniform measurement 
model in ( 1141) and assuming the min-rank decoder is used, 



E{R)< |(1- R)- 27(1 -7/2)1" 



(23) 



The proof of this result hinges on the pairwise independence 
of the events Az and de Caen's inequality [23], which for the 
reader's convenience, we restate here: 

Lemma 5 (de Caen flU). Let (0,^",Q) be a probability 
space. For a finite number events B\, ■ ■ ■ ,Bm € the 
probability of their union can be lower bounded as 



(24) 




1 Z^m' = l 



nB m ,) 



We now prove Proposition |4] 

Proof: In order to apply d24l to analyze the error proba- 
bility in ( [T7l i, we need to compute the probabilities P(Az) and 
P(Az DAz')- The former is q~ k as argued in dl81 l. The latter 
uses the following lemma which is proved in Appendix lAl 

Lemma 6 (Pairwise Independence). For any two distinct 
matrices Z and Z', neither of which is equal to X, the events 
Az and Az' (defined in d!61 >) are independent. 

As a result of this lemma, F(A z nAz>) = P(-4z)P(-4z') = 
q -2k if z ^ Z ' and f(AznAz') = P(Az) = q~ k if Z = Z'. 
Now, we apply the lower bound d24T > to P(£ „) noting from (TTTb 
that £ n is the union of all Az such that Z ^ X and rank(Z) < 
f := rank(X). Then, for a fixed 77 > 0, we have 

P(£n) > E 7 — V 



Z:Z#X k 

rank(Z)<rank(X) « 



I+ZJ Z':ZVX,Z 1 

v rank(Z')<rank(X) 



(a) ^2nf-f 2 _ (6) (? n 2 [27(l-7/2)-r,-fc/n 2 ] _ q -k 



1 +4 9 



2nr—r 2 —k 



l _|_ 4 (7 r l 2[2 7 (l- 7 /2)+ J? -fe/n 2 ] ' 



where (a) is from the upper and lower bounds in (|7) and 

(b) holds for all rt sufficiently large since f/n — > 7. See 

argument justifying inequality (c) in ( fT9l ). Assuming that 

7 The "block-length" of the code V in UJ\ is n 2 . 



1 — i? > 27 (1 — 7/2), the normalized logarithm of the error 
probability can now be simplified as 



limsup-^log„P(£ n ) < -2 7 (1 - 7 /2) + ry+ lim \, 



(25) 



where we used the fact that 4 <7 « 2 [27(i-7/2)+»?-fc/« 2 ] ^ q for 
sufficiently small 77 > 0. The case where 1 — i? < 27 (1 — 7/2) 
results in E(R) = because P(£„) fails to converge to zero 
as n — » 00. The proof of the upper bound of the reliability 
function is completed by appealing to the definition of R 
in d20b and the arbitrariness of ?; > 0. ■ 

Corollary 7 (Reliability function). Under the assumptions of 
Proposition [4] the error exponent of the min-rank decoder is 



e(r) = 1 (1-^-27 (1-7/2) r 



(26) 



Proof: The lower bound on E(R) follows from the 
achievability in ( fT9l ), which may be strengthened as follows: 



»(£«) < 4g" 



; |-2 7 (l- 7 /2)-^+fc/n 2 p 



(27) 



since P(£ „) can also be upper bounded by unity. Now, because 
I • | + is continuous, the lower limit of the normalized logarithm 
of the bound in d27| i can be expressed as follows: 



lim inf log 

n—>oo ¥1 



> 



-27 (1 - 7/2) - 7? + lim 



(28) 

Combining the upper bound in Proposition |4] and the lower 
bound in (l28l and noting that 77 > is arbitrary yields the 
reliability function in (1261 1. ■ 

We observe that pairwise independence of the events Az 
(Lemma [6} is essential in the proof of Proposition |4] Pairwise 
independence is a consequence of the linear measurement 
model in (fJJ and the uniformity assumption in ( [Pfl i. Note that 
the events Az are not jointly (nor triple-wise) independent. But 
the beauty of de Caen's bound allows us to exploit the pairwise 
independence to lower bound P(£ n ) and thus to obtain a 
tight upper bound on E(R). To draw an analogy, just as only 
pairwise independence is required to show that linear codes 
achieve capacity in symmetric DMCs, de Caen's inequality 
allows us to move the exploitation of pairwise independence 
into the error exponent domain to make statements about the 
error exponent behavior of ensembles of linear codes. 

A natural question arises: Is E(R) given in d26l i the largest 
possible exponent over all decoders X( • ) for the model in 
which H a follows the uniform pmf? We conjecture that this 
is indeed the case, but a proof remains elusive. 

1) Comparison of error exponents to existing works 138]: 
As mentioned in the Introduction, the preceding results can 
be interpreted from a coding-theoretic perspective. This is 
indeed what we will do in Section IVHI In this subsection, 
we compare the reliability function derived in Corollary 
with three other coding techniques present in the literature. 
First, we have the well-known construction of maximum rank 
distance (MRD) codes by Gabidulin [7[. Second, we have 
the error trapping technique [18] alluded to in Section II-A1 
Third, we have a combination of the two preceding code 
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constructions which is discussed in [18, Section VI. E]. To 
perform this comparison, we define another reliability function 
Ei(R) that is "normalized by n". This is simply the quantity 
in d22b where the normalization is 1/n instead of 1/n 2 . We 
now denote the reliability function normalized by n 2 as in (l22l 
by Ez(R). We also use various superscripts on E\ and E% 
to denote different coding schemes. Hence, for our encoding 
and decoding strategy using random sensing and min-rank 
decoding (RSMR), E? SMR (i?) = oo for all R < (1 - 7 ) 2 
and E RSMR {R) is given by <E§). 

Since Gabidulin codes are MRD, they achieve the Singleton 
bound lfl2l Section III]) for rank-metric codes given by n 2 — 
k < n(n — c?r + 1), where dn is the minimum rank distance 
of the code in (fJTJ [See exact definitions in (l48l and d49l l1. 
Thus, it can be verified that for j = 1,2, 



Ef (R) 



oo 




R < 1 - 27 
else 



(29) 



From [18 Section IV.B, Eq. (12)], it can also be checked that 
for the error trapping coding strategy, assuming the low-rank 
error matrix is uniformly distributed over those of rank r, 

+ 



E^(R) 



1 



'7- 



£ 2 ET (i?) - 0. 



(30) 



Finally, from [18 Section VI. E], for the combination of 
Gabidulin coding and error trapping, under the same condition 
of uniformity, 



GabET 



(R) = 



1-7- 



R 



£ 2 GabET (i?) = 0. (31) 



1-7 

Note that for the error exponents in (|29l , ((30} and (|3T1 , the 
randomness is over the low-rank error matrix X and not the 
code construction, which is deterministic. In contrast, our 
coding strategy RSMR involves a random encoding scheme. 
It can be seen from (l29l to (l3TT l that there is a non-trivial 
interval of rates St := [1 — 27, (1 — j) 2 } in which our 
reliability functions E RSMR (R) and E RSMR (R) are the best 
(largest). Indeed, in the interval ffl, Ef SMR (R) = 00 and 
our result in (f22]i implies that E RSMR (R) > whereas all 
the abovementioned coding schemes give E 2 (R) = 0. Thus, 
using both a random code for encoding and min-rank decoding 
is advantageous from a reliability function standpoint in the 
regime R G ffl. Furthermore, as we shall see from (l40l 
in Section |VI] which deals with the sparse sensing setting 
(SRSMR), Ef RSMR (R) = 00 and £ 2 SRSMR (i?) = for 
all R < (1 — 7) 2 . Such an encoding scheme using sparse 
parity-check matrices may be amenable for the design of 
low-complexity decoding strategies that also have good error 
exponent properties. In general though, our min-rank decoder 
requires exhaustive search (though Section IVIIII proposes 
techniques to reduce the search space), while all the preceding 
techniques have polynomial-time decoding complexity. 

V. Uniformly Random Sensing Matrices: The 
Noisy Case 

We now generalize the noiseless model and the accompany- 
ing results in Section [IV] to the case where the measurements 
y k are noisy as in (01. As in Section |IV] we assume that the 



elements of H a are i.i.d. and uniform in ¥ q . The noise w is first 
assumed in Section IV- Al to be deterministic but unknown. We 
then extend our results to the situation where w is a random 
vector in Section IV-BI 



A. Deterministic Noise 

In the deterministic setting, we assume that ||w||o = \crn 2 \ 
for some noise level a <G (0, k/n 2 }. Instead of using the 
minimum entropy decoder as in 11291 (also see 061 ). we 
consider the following generalization of the min-rank decoder: 



minimize rank(X) + A||w||o 
subject to (H a ,X) + w a = y a . 



a G [k] (32) 



The optimization variables are X G ¥ q ixn and w G F^. The 
parameter A = A„ > governs the tradeoff between the rank 
of the matrix X and the sparsity of the vector w. Let H q (p) := 
—p\og q (p) — (1 — p) \og q (p) be the (base-g) binary entropy. 

Proposition 8 (Achievability under deterministic noisy mea- 
surement model). Fix e > and choose A = 1/n. Assume the 
uniform measurement model and that ||w||o = [c^ 2 ]- If 



fc . (3 + £ )( 7 + <7)[l-(7 + tT)/3] n2 



(33) 



ff 2 [l/(3-( 7 + cr))]log g 2 
then P(£„) — > as n — > 00. 

The proof of this proposition is provided in Appendix |B] 
Since the prefactor in (133} is a monotonically increasing 
function in the noise level cr, the number of measurements 
increases as a increases, agreeing with intuition. Note that the 
regularization parameter A is chosen to be 1/n and is thus 
independent of a. Hence, the decoder does not need to know 
the true value of the noise level a. The factor of 3 (instead of 2) 
in d33l arises in part due to the uncertainty in the locations of 
the non-zero elements of the noise vector w. We remark that 
Proposition [8] does not reduce to the noiseless case (a = 0) 
in Proposition |3]because we assumed a different measurement 
model in ©, and employed a different bounding technique. 

The measurement complexity in (1331 is suboptimal, i.e., it 
does not match the converse in ([9}. This is because the decoder 
in (l32l estimates both the matrix X and the noise w whereas 
in the derivation of the converse, we are only concerned with 
reconstructing the unknown matrix X. By decoding (X, w) 
jointly, the analysis proceeds along the lines of the proof of 
Proposition [3] It is unclear whether a better parameter- free 
decoding strategy exists in the presence of noise and whether 
such a strategy is also amenable to analysis. The noisy setting 
was also analyzed in [31 1 but, as in our work, the number of 
measurements for achievability does not match the converse. 

B. Random Noise 

We now consider the case where the noise in (|3} is random, 
i.e., w = (wi, . . . , Wfc) £ ¥ q is a random vector. We assume 
the noise vector w is i.i.d. and each component is distributed 
according to any pmf for which 



P w {w;p) = l-p 



if 



0. 



(34) 
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This pmf represents a noisy channel where every symbol 
is changed to some other (different) symbol independently 
with crossover probability p £ (0,1/2). We can ask how 
many measurements are necessary and sufficient for recovering 
a fixed X in the presence of the additive stochastic noise 
w. Also, we are interested to know how this measurement 
complexity depends on p. We leverage on Propositions [2] 
and [8] to derive a converse result and an achievability result 
respectively. We start with the converse, which is partially 
inspired by Theorem 3 in ll3~Tl . 



Plot of the critical a against p for q = 2 



Corollary 9 (Converse under random noise model). Assume 
the setup in Proposition\2\and consider the noisy measurement 
model given by © and ( 1341 l. Additionally, assume that X, H fe 
and w are jointly independent. If, 



(2- £ ) 7 (l- 7 /2) 
k< l-H q {p) 



(35) 



then for any estimator, P(£ n ) > e/4 > for all n sufficiently 
large, where £ n is defined in I©. 

Note that the probability of error F(£ n ) above is computed 
over both the randomness in the sensing matrices H a and in 
the noise w. The proof is given in Appendix [C] From (|35| >. 
the number of measurements necessarily has to increase by a 
factor of 1/(1 — H q (p)) for reliable recovery. As expected, for 
a fixed q, the larger the crossover probability p £ (0, 1/2), the 
more measurements are required. The converse is illustrated 
for different parameter settings in Figs. Q~]and |2] 

To present our achievability result compactly, we assume 
that k = [an 2 ] for some scaling parameter a £ (0, 1), i.e., the 
number of observations is proportional to n 2 and the constant 
of proportionality is a. We would like to find the range of 
values of the scaling parameter a such that reliable recovery 
is possible. Recall that the upper bound on the rank is r and 
the noise vector has expected weight pk ps pan 2 . 

Corollary 10 (Achievability under random noisy measurement 
model). Fix e > and choose X — l/n. Assume the uniform 
measurement model and that k = \an 2 ~\. Define the function 

g(a; p, 7) :=a [l - (log, 2)H 2 (p + 7/a) - 2p(l - 7)] +a 2 p 2 . 

(36) 

If the tuple (a,p, 7) satisfies the following inequality: 



<?(a;p, 7 ) > (2 + e)7(l-7/2), 



(37) 



then ¥(£ n ) — > as n 



The proof of this corollary uses typicality arguments and 
is presented in Appendix [D] As in the deterministic noise 
setting, the sufficient condition in (f3Tb does not reduce to the 
noiseless case (p = 0) in Proposition [3] It also does not match 
the converse in (l35l l. This is due to the different bounding 
technique employed to prove Corollary [10] [both X and w are 
decoded in d32ll. In addition, the inequality in ( f37l > does not 
admit an analytical solution for a. Hence, we search for the 
critical a [the minimum one satisfying d37b l numerically for 
some parameter settings. See Figs. [T] and |2] for illustrations 
of how the critical a varies with (p, 7) when the field size is 
small (q — 2) and when it is large (q = 256). 



S 0.6 



■y= 0.075 (ach) 
■y= 0.050 (ach) 
: 0.025 (ach) 
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Fig. 1. Plot of cr cr it against p for q = 2. Both a cr it for the converse (con) 
in (35) the achievability (ach) in (37) are shown. All a's below the converse 
curves are not achievable. 
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Plot of the critical a against p for q = 256 
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Fig. 2. Plot of Q cr it against p for q = 256. See Fig.[T]for the legend. 



From Fig.Q] we observe that the noise results in a significant 
increase in the critical value of the scaling parameter a 
when q = 2. We see that for a rank-dimension ratio of 
7 = 0.05 and with a crossover probability of p — 0.02, 
the critical scaling parameter is a cr i t « 0.32. Contrast this 
to the noiseless case (Proposition [3) and the converse result 
for the noisy case (Corollary |9) which stipulate that the 
critical scaling parameters are 27(1 — 7/2) pa 0.098 and 
27(1 - 7/2)/(l - H 2 (p)) ~ 0.114 respectively. Hence, 
we incur roughly a threefold increase in the number of 
measurements to tolerate a noise level of p = 2%. This 
phenomenon is due to our incognizance of the locations of 
the non-zero elements of w (and hence knowledge of which 
measurements y a are reliable). In contrast to the reals, in the 
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Comparison of a cr j t between TBD and EM 




loga(g) 

Fig. 3. Plot of ct cr jt against log 2 (q) for our work (TBD Corollary 1 1 01 . the 
converse in Corollary [9] and Emad and Milenkovic (EM) 1311 . 

finite field setting, there is no notion of the "size" of the 
noise (per measurement). Hence, estimation performance in 
the presence of noise does not degrade as gracefully as in the 
reals (cf. [6, Theorem 1.2]). However, when the field size is 
large (more likened to the reals), the degradation is not as 
severe. This is depicted in Fig. |2] Under the same settings as 
above, a cr it ~ 0.114, which is not too far from the converse 
(27(1 - 7 /2)/(l - H 256 (p)) « 0.099). 

As a final remark, we compare the decoders for the noisy 
model in d32l and that in ||3T1 . In 13 II . the authors considered 
the (analog of) following decoder (for tensors): 

minimize rank(X) 

subject to || yx - y ||o < r, (38) 

where y x := [(Hi,X) ... (H fc ,X)] T and y = y fe is the 
noisy observation vector in (01. However, the threshold r 
that constrains the Hamming distance between y-^ and y is 
not straightforward to choose@ Our decoder, in contrast, is 
parameter-free because the regularization constant A in (l32l 
can be chosen to be 1/n, independent of all other parameters. 
In addition, Fig. [3] shows that at high q, our decoder and 
analysis result in a better (smaller) a cr it than that in |3"T1 . 
Our decoding scheme gives a bound that is closer to the 
converse at high q while the decoding scheme in [31] is farther. 
The slight disadvantage of our decoder is that the number of 
measurements in (l37l i cannot be expressed in closed-form. 

VI. Sparse Random Sensing Matrices 

In the previous two sections, we focused exclusively on the 
case where the elements of the sensing matrices H a ,a £ [k], 
are drawn uniformly from ¥ q . However, there is substantial 
motivation to consider other ensembles of sensing matrices. 

8 In fact, the achievability result of Theorem 4 in 1311 says that r = r\k 
where r\ £ (p, (q — l)/q) but for our optimization program in ( 1321 . the 
decoder does not need to know the crossover probability p. 



For example, in low-density parity-check (LDPC) codes, the 
parity-check matrix (analogous to the set of H a matrices) is 
sparse. The sparsity aids in decoding via the sum-product algo- 
rithm [39 1 as the resulting Tanner (factor) graph is sparse [26|. 
In 11321 . the authors considered the case where the generator 
matrices are sparse and random but their setting is restricted 
to the BSC and BEC channel models. 

In this section, we revisit the noiseless model in © and 
analyze the scenario where the sensing matrices are sparse 
on average. More precisely, each element of H a ,a G [k], is 
assumed to be an i.i.d. random variable with associated pmf 

^ S ^ := { s/\q-i) he\~\{0} ■ (39) 

Note that if 6 is small, then the probability that an entry 
in H a is zero is close to unity. The problem of deriving a 
sufficient condition for reliable recovery is more challenging 
as compared to the equiprobable case since ([T8l no longer 
holds (compare to Lemma I2TI1. Roughly speaking, the matrix 
X is not sensed as much as in the equiprobable case and the 
measurements y fc are not as informative because H a ,a € [k], 
are sparse. In the rest of this section, we allow the sparsity 
factor S to depend on n but we do not make the dependence of 
5 on n explicit for ease of exposition. The question we would 
like to answer is: How fast can 8 decay with n such that the 
min-rank decoder is still reliable for weak recovery? 

Theorem 11 (Achievability under sparse measurement model). 

Fix e > and let 5 be any sequence in 0(^2) no(l). Under 
the sparse measurement model as in ( 1391 ), if the number of 
measurements k satisfies ( 1151 ) for all n > N e> s, then ¥(£ n ) — > 
as n — > oo. 

The proof of Theorem Q~T] our main result, is detailed in 
Appendix [E] It utilizes a "splitting" technique to partition the 
set of misleading matrices {Z ^ X : rank(Z) < rank(X)} 
into those with low Hamming distance from X and those with 
high Hamming distance from X. 

Observe that the sparsity-factor S is allowed to tend to zero 
albeit at a controlled rate of f2(=2£H). Thus, each H a is allowed 
to have, on average, f2(nlogn) non-zero entries (out of n 2 
entries). The scaling rate is reminiscent of the number of trials 
required for success in the so-called coupon collector's prob- 
lem. Indeed, it seems plausible that we need at least one entry 
in each row and one entry in each column of X to be sensed 
(by a sensing matrix H a ) for the min-rank decoder to succeed. 
It can easily be seen that if S = o(^p), there will be at least 
one row and one column in H Q of zero Hamming weight w.h.p. 
Really surprisingly, the number of measurements required in 
the S = fi(^S^)-sparse sensing case is exactly the same as 
in the case where the elements of H a are drawn uniformly at 
random from F g in Proposition [3] In fact it also matches the 
information-theoretic lower bound in Proposition |2] and hence 
is asymptotically optimal. We will analyze this weak recovery 
sparse setting (and understand why it works) in greater detail 
by studying minimum distance properties of sparse parity- 
check rank-metric codes in Section IVH-BI The sparse scenario 
may be extended to the noisy case by combining the proof 
techniques in Proposition [8] and Theorem [TT] 
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There are two natural questions at this point: Firstly, can 
the reliability function be computed for the min-rank decoder 
assuming the sparse measurement model? The events Az, 
defined in ( TT6b . are no longer pairwise independent. Thus, it is 
not straightforward to compute ¥(Azr\Az') as in the proof of 
Proposition |4] Further, de Caen's lower bound may not be tight 
as in the case where the entries of the sensing matrices are 
drawn uniformly at random from ¥ q . Our bounding technique 
for Theorem QT| only ensures that 

Km sup — log, ¥(£ n )<-C (40) 

n^oo nlogn 

for some non-trivial C £ (0, oo). Thus, instead of having a 
speecfl of n 2 in the large-deviations upper bound, we have 
a speed of nlogn. This is because S is allowed to decay to 
zero. Whether the speed ?ilog?i is optimal is open. Secondly, 
is S = fi(^Sii) the best (smallest) possible sparsity factor? Is 
there a fundamental tradeoff between the sparsity factor 5 and 
(a bound on) the number of measurements fc? We leave these 
for further research. 

VII. Coding-Theoretic Interpretations and 
Minimum Rank Distance Properties 

This section is devoted to understand the coding-theoretic 
interpretations and analogs of the rank minimization problem 
in (1121 . In particular, we would like to understand the ge- 
ometry of the random linear rank-metric codes that underpin 
the optimization problem in ( TT~2T > for both the equiprobable 
ensemble in ( fl4l and the sparse ensemble in (f39). 

As mentioned in the Introduction, there is a natural corre- 
spondence between the rank minimization problem and rank- 
metric decoding 171- 11121 . In the former, we solve a problem 
of the form (1121 1. In the latter, the code c € typically consists 
of length-n vector^ whose elements belong to the extension 
field ¥ q n and these vectors in F™„ a belong to the kernel of 
some linear operator H. A particular vector codeword c £ 
is transmitted. The received word is r = c + x, where x is 
assumed to be a low-rank "error" vector. (By rank of a vector 
we mean that there exists a fixed basis of ¥ q n over ¥ q and the 
rank of a vector a £ F™„ is defined as the rank of the matrix 
A £ fl?™*™ whose elements are the coefficients of a in the 
basis. See [10, Sec. VI. A] for details of this isomorphic map.) 
The optimization problem for decoding c given r is then 

minimize rank(r — c) 

subject to c e c tf (41) 

which is identical to the min-rank problem in (fT2l with the 
identification of the low error vector x = r c. Note that 
the matrix version of the vector r (assuming a fixed basis), 
denoted as R, satisfies the linear constraints in (fj). Since the 
assignment (A, B) M> rank(A— B) is a metric on the space of 
matrices [ 10 Sec. II. B], the problem in (f4Tb can be interpreted 
as a minimum (rank) distance decoder. 

'The term speed is in direct analogy to the theory of large-deviations 1401 
where P n is said to satisfy a large-deviations upper bound with speed a n and 
rate function J( ■ ) if limsupy^^ logP n (£) < — inf^gcug) J(x). 

10 We abuse notation by using a common symbol "a? to denote both a code 
consisting of vectors with elements in ¥ q n and a code consisting of matrices 
with elements in ¥ q . 



A. Distance Properties of Equiprobable Rank-Metric Codes 

We formalize the notion of an equiprobable linear code and 
analyze its rank distance properties in this section. The results 
we derive here are the rank-metric analogs of the results in 
Barg and Forney lfl9l and will prove to be useful in shedding 
light on the geometry involved in the sufficient condition for 
recovering the unknown low -rank matrix X in Proposition [3] 

Definition 3. A rank-metric code is a non-empty subset 
of F" x ™ endowed with the the rank distance (A,B) t— > 
rank(A - B). 

Definition 4. We say that 1 C FJ X " is an equiprobable linear 
rank-metric code if 

^:={CeF; x " : (C,H„) =0,a£ [fc]} (42) 

where H a , a £ [fc] are random matrices where each entry is 
statistically independent of other entries and equiprobable in 
¥ q , i.e., with pmf given in (114l l. Each matrix C £ c € is called a 
codeword. Each matrix H a is said to be a parity-check matrix. 

Recall that the inner product is defined as (C, H a ) = 
Tr(C ). We reiterate that in the coding theory literature [0- 
[12 1, rank-metric codes usually consist of length-n vectors 
c £ c € whose elements belong to the extension field ¥ q n . We 
refrain from adopting this approach here as we would like to 
make direct comparisons to the rank minimization problem, 
where the measurements are generated as in Hence, the 

term codewords will always refer to matrices in 

Definition 5. The number of codewords in the code ^ of rank 
r (r = 0, 1, . . . , n) is denoted as N<^(r). 

Note that N<^(r) is a random variable since ^ C F™ xn is 
a random subspace. This quantity can also be expressed as 

Mr):= E I{Me^}, (43) 

MeFj x ":rank(M)=r 

where I{M £ ^} is the (indicator) random variable which 
takes on the value one if M G f and zero otherwise. Note that 
the matrix M is deterministic, while the code ^ is random. We 
remark that the decomposition of N<g>(r) in (l43l is different 

"The usual approach to defining linear rank-metric codes f7], [8] is the 
following: Every codeword in the codebook, c £ F™ N , is required to satisfy 
the m parity-check constraints h a ,iCi = £ F^n for a £ [m] and 

where V\ a ,i £ ^ q « and c i £ ^ q N are, respectively, the i-th elements of h a 
and c. Note that in the paper we focus on the case N = n, but make the 
distinction here to connect directly with the coding literature. We can reexpress 
each of these m constraints as N matrix trace constraints in ¥ q , per I42K as 
follows. Consider any basis B for F ? jv over ¥ q , B = {bi, . . . , bjv}, where 
bj £ F ? jv. We represent \\ a ,i and in this basis as \\ a> i = JZ^Li h a ,i jbj 
and Ci = Ci fcbj., respectively. Let H a be the n X N matrix whose 

(i, j)-th entry is the coefficient h a ,i,j £ an d C be similarly defined by the 
c i,k £ IFq. Now define Uj^k.l as th e coefficients in F 9 of the representation 
of bjbfc, i.e., bjbfc = J3; =1 u)j ;b(. Define f2; to be the symmetric 
N X N matrix whose (j, fc)-th entry is ujj ^ i. By substituting the expansions 
for h a and c into the standard parity-check definition and making use of 
the fact that the basis elements hj are linearly independent, we discover the 
following: the constraint 5Z™ =1 h a ,iCi = is equivalent to the N constraints 
Tr(Cf2;H^) = £ F, for I £ [N], If we define H a Si t for each a £ [m], 
I £ [N] to be one of the constraints in j421 . we get that the set of C matrices 
c (o satisfying j42t is the rank-metric codes defined by the h a , a £ [m], A 
simple relation between the fi; matrices holds if the basis is chosen to be a 
normal basis |4l Def. 2.32]. 
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from that in Barg and Forney [19, Eq. (2.3)] where the authors 
considered and analyzed the analog of the sum 

N»(r):= ]T E{rank(C;) = r}, (44) 

J e{i,...,|^|}:C J #o 

where j G {1, . . . , |"^|} indexes the (random) codewords in 
tf. Note that N^(r) = N^(r) for all r > 1 but they differ 
when r = (N ¥ (0) = while N<^(0) = 1). It turns out 
that the sum in d43l is more amenable to analysis given 
that our parity-check (sensing) matrices H Q ,a <E [k], are 
random (as in Gallager's work in [20. Theorem 2.1]) whereas 
in |fl9l Sec. II. C], the generators are random{3 Recall the rank- 
dimension ratio 7 is the limit of the ratio r/n as n — > oo. 
Using d43l , we can show the following: 

Lemma 12 (Moments of N^(r)). For r = 0, N^(r) = 1. For 
1 < r < n, the mean of N^»(r) satisfies 

q -k+2rn-r 2 -2r < ENy ( r ) < 4? -fc+2m-r 2 _ (45) 

Furthermore, the variance o/N^(r) satisfies 

var(N^(r)) < ENy(r). (46) 

The proof of Lemma[T2]is provided in Appendix|F] Observe 
from ( |43T > that the average number of codewords with rank r, 
namely EN^(r), is exponentially large (in n 2 ) if fc < (2 — 
e)-f(l — j/2)n 2 (compare to the converse in Proposition |2) and 
exponentially small if k > (2 4- e)j(l — r y/2)n 2 (compare to 
the achievability in Proposition |3). By Chebyshev's inequality, 
an immediate corollary of Lemma [T2l is the following: 

Corollary 13 (Concentration of number of codewords of rank 
r). Let f n be any sequence such that Imin^^ f n — oo. Then, 

lim P (|N»(r) - EN ¥ (r)| > f n y/EN v (r)) = 0. (47) 

Thus, N<g-(r) concentrates to its mean in the sense of d47l >. 
A similar result for the random generator case was developed 
in (9] Corollary 1]. Also, our derivations based on Lemma [T2l 
are cleaner and require fewer assumptions. We now define the 
notion of the minimum rank distance of a rank-metric code. 

Definition 6. The minimum rank distance of a rank-metric 
code 'to is defined as 

dRO*?) := min rank(Ci - C 2 ). (48) 

By linearity of the code c & , it can be seen that the minimum 
rank distance in d48l can also be written as 

d R (<t?) := min rank(C). (49) 

C6?:C/0 

Thus, the minimum rank distance of a linear code is equal to 
the minimum rank over all non-zero matrix codewords. 

Definition 7. The relative minimum rank distance of a code 

Note that the relative minimum rank distance is a random 
variable taking on values in the unit interval. In this section, 

12 Indeed, if the generators are random, it is easier to derive the statistics 
of the number of codewords of rank r using j44t instead of i43t . 



we assume there exists some a € (0, 1) such that k/n 2 — > a 
(cf. Section IV-Bb . This is the scaling regime of interest. 

Proposition 14 (Asymptotic linear independence). Assume 
that each random matrix H a G F™ xn consists of independent 
entries that are drawn according to the pmf in (1391 >. Let 
m := dim(span{vec(Hi ),..., vec(H fc )}). If 5 € fi(^), 
then m/k — > 1 almost surely (a.s.). 

The proof of this proposition is a consequence of a result 
by Blomer et al. ll42l . We provide the details in Appendix iGl 

We would now like to define the notion of the rate of a ran- 
dom code. Strictly speaking, since ^ is a random linear code, 
the rate of the code should be defined as the random variable 
R„ := 1 — m/ri 2 . However, a consequence of Proposition [T4l 
is that R„/(l - k/n 2 ) -> 1 a.s. if J € fX 1 ^)- Note that 
this prescribed rate of decay of S subsumes the equiprobable 
model (of interest in this section) as a special case. (Take 
5 = (q — l)/q to be constant.) In light of Proposition [141 we 
adopt the following definition: 

Definition 8. The rate of the linear rank-metric code [as 
in ( 1421 l 7 is defined as 

„ n 2 — k k 

R n ■■= o— = 1 - (50) 

The limit of R n in (|50| is denoted as R £ [0, 1]. Note also 
that R n /R -> 1 a.s. 

Proposition 15 (Lower bound on relative minimum distance). 
Fix e > 0. For any R € [0, 1], the probability that the 
equiprobable linear code in i42\ has relative minimum rank 
distance less than 1 — ^/R — e goes to zero as n — > oo. 

Proof: AssumJ^l e G (0, 2(1 — 7)) and define the positive 
constant e' := 2e(l — 7) — e 2 . Consider a sequence of ranks 
r such that r/n — >• 7 < 1 — — e. Fix r\ = e'/2 > 0. Then, 
by Markov's inequality and d431 l. we have 

P(N ¥ (r) > 1) < EN^(r) < 4g -« 2 [^-27(i-7/2)- t 7] j (51) 

for all n > N e >. Since 7 < 1 — \/R — e, we may assert 
by invoking the definition of R that k > (27(1 — 7/2) + 
e')n 2 . Hence, the exponent in square parentheses in (|5H is 
no smaller than e'/2. This implies that P(N^(r) > 1) -t 
or equivalently, P(N<^(r) = 0) 1. In other words, there are 
no matrices of rank r in the equiprobable linear code with 
probability at least 1 - 4q~ £ ' n I 2 for all n> N e <. ■ 
We now introduce some additional notation. We say that 
two positive sequences {a n } n ^ and {b n } n( zfi are equal to 
second order in the exponent (denoted a„ = b n ) if 

lim \ log, ^=0. (52) 

Proposition 16 (Concentration of relative minimum distance). 

Fix e > 0. For any R € [0, 1], if r is a sequence of ranks 
such that r/n — > 7 > 1 — ^/R + e, then the probability that 
N^(r) = 9 - fc + 2 T( 1 -7/2)n 2 g 0es to one as n ^ OQ _ 

13 The restriction that e < 2(1—7) i s not a serious one since the validity of 
the claim in Proposition 1 1 5 1 for some eg > implies the same for all e > eq. 
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Proof: If the sequence of ranks r is such that r/n —> 
7 > 1 — \J~R~ + e, then the average number of matrices in the 
code of rank r, namely EN-g-(r), is exponentially large. By 
Markov's inequality and the triangle inequality, 



|N»(r)-EN»(r)| >t) < 



< 



E|Ny(r)-ENy(r)| 
t 

2EN^(r) 



(53) 



Choose t := ^+(27(1-7/2)+^+^ where ^ ifj giyen jn ±e 
proof of Proposition [15] Then, applying (|45l to (|53l yields 



|N»(r) -EN»(r)| > t) < 



(54) 



Hence, N<g-(r) € (EN^(r) - t,ENy(r) + i) with probability 
exceeding 1 — 8q~ n . Furthermore, it is easy to verify that 
EN<«?(r) ± t = g-fc+27(i-7/2)n 2 5 as desired. ■ 
Propositions [TBI and [TBI allow us to conclude that with prob- 
ability approaching one (exponentially fast) as n — > 00, the 
relative minimum rank distance of the equiprobable linear code 
in d42b is contained in the interval (1 — y/~R — e, 1 — y/~R + e) 
for all R e [0, 1]. The analog of the Gilbert- Varshamov (GV) 
distance HH Sec. II.C] is thus 



7gvCR) :=1-VH 



(55) 



Indeed, by substituting the definition of R into N<^(r) in 
Proposition [TBI we see that a typical (in the sense of [19|) 
equiprobable linear rank-metric code has distance distribution: 



= n 2 [ii-(l- 7 ) 2 ] 

Ntyp(0 7 < 7ov(i?) 



7 > 7gv(-R) +e, 



(56) 



We again remark that Loidreau in J9] Sec. 5] also derived 
results for uniformly random linear codes in the rank-metric 
that are somewhat similar to Propositions [TBI and [TBI However, 
our derivations are more straightforward and require fewer 
assumptions. As mentioned above, we assume that the parity- 
check matrices H a ,a £ [k], are random (akin to [20. The- 
orem 2.1]), while the assumption in J9] Sec. 5] is that the 
generators are random and linearly independent. Furthermore, 
to the best of our knowledge, there are no previous studies on 
the minimum distance properties for the sparse parity-check 
matrix setting. We do this in Section IVII-BI 

From the rank distance properties, we can re-derive the 
achievability (weak recovery) result in Proposition [3]by using 
the definition of R and solving the following inequality for k: 

1 - VR - e > 7. (57) 

This provides geometric intuition as to why the min-rank de- 
coder succeeds on average; the typical relative minimum rank 
distance of the code should exceed the rank-dimension ratio 
for successful error correction. We derive a stronger condition 
(known as the strong recovery condition) in Section IVII-CI 

B. Distance Properties of Sparse Rank-Metric Codes 

In this section, we derive the analog of Proposition [15] for 
the case where the code ^ is characterized by sparse sensing 
(or measurement or parity-check) matrices H a , a G [ft]. 



Definition 9. We say that is a <5-sparse linear rank-metric 
code if 'To is as in (1421 l and where H Q ,a € [k] are random 
matrices where each entry is statistically independent and 
drawn from the pmf Ph(- ;S,q) defined in 



To analyze the number of matrices of rank r in this random 
ensemble N<^(r), we partition the sum in (1431 into subsets of 
matrices based on their Hamming weight, i.e., 

n 2 

Mr) =J2 Yl ^ M e (58) 

d=0 M£F£ x ":rank(M)=r,||M|| =d 

Define 9(d; 6, q, k) := [q- 1 + (1 - g~ 1 )(l - 6/(1 - q' 1 ))^. 
As shown in Lemma |2TI in Appendix [E] this is the probability 
that a non-zero matrix M of Hamming weight d belongs to the 
(5-sparse code % '. We can demonstrate the following important 
bound for the (5-sparse linear rank-metric code: 

Lemma 17 (Mean of N-^(r) for sparse codes). For r = 0, 

N^(r) = 1. If 1 < r < n and 77 > 0, 

EN<^(r) < 2 n2H ^ p \q - lf n2 (l - S) k + 

+ 4n 2 q n2 [ 2 f( 1 -'>'/ 2 )+ ? ?+^ 1o s 5 et - ^« 2 1 \ s -i' k )} ; (59) 

for all (3 <E [0, 1/2] and all n > N v . 



By using the sum in d58l >. one sees that this lemma can be 
justified in exactly the same way as Theorem QT] (See steps 
leading to (fSTT > and ( f82l in Appendix |E). Hence, we omit its 
proof. Lemma [17] allows us to find a tight upper bound on the 
expectation of N<^(r) for the sparse linear rank-metric code 
by optimizing over the free parameter (3 € [0, 1/2]. It turns 
out j3 = is optimum. In analogy to PropositionfTBIfor 

the equiprobable linear rank-metric code, we can demonstrate 
the following for the sparse linear rank-metric code. 

Proposition 18 (Lower bound on relative minimum distance 
for sparse codes). Fix e > assume that S = fi(-^2) H o(l). 
For any R € [0, 1], the probability that the sparse linear code 
has relative minimum distance less than 1 — y/R — e goes to 
zero as n — > 00. 

Proof: The condition on the minimum distance implies 
that k > (2 + 2)7(1 — 7/2)rt 2 for some e > (for sufficiently 
small e). See detailed argument in proof of Proposition [15] 
This implies from Theorem Q~T] Lemma [17] and Markov's 
inequality that P(N<^(r) > 1) 0. ■ 
Proposition [18] asserts that the relative minimum rank dis- 
tance of a 5 = fi(i^^)-sparse linear rank-metric code is at 
least 1 — \/R — e w.h.p. Remarkably, this property is exactly 
the same as that of a (dense) linear code (cf. Proposition [TBT l 
in which the entries in the parity-check matrices H a are 
statistically independent and equiprobable in W q . The fact that 
the (lower bounds on the) minimum distances of both ensem- 
bles of codes coincide explains why the min-rank decoder 
matches the information-theoretic lower bound (Proposition[2]i 
in the sparse setting (Theorem QT) just as in the dense one 
(Proposition [3j- Note that only an upper bound of EN<^(r) as 
in d59l l is required to make this claim. 
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C. Strong Recovery 

We now utilize the insights gleaned from this section 
to derive results for strong recovery (See Section HI-DI and 
also [27 Sec. 2] for definitions) of low-rank matrices from 
linear measurements. Recall that in strong recovery, we are 
interested in recovering all matrices whose ranks are no larger 
than r. We contrast this to weak recovery where a matrix X (of 
low rank) is fixed and we ask how many random measurements 
are needed to estimate X reliably. 

Proposition 19 (Strong recovery for uniform measurement 
model). Fix e > 0. Under the uniform measurement model, 
the min-rank decoder recovers all matrices of rank less than 
or equal to r with probability approaching one as n — > oo if 

k > (4 + e)7(l- 7 )n 2 . (60) 

We contrast this to the weak achievability result (Propo- 
sition O in which X with rank(X) < r was fixed and we 
showed that if k > (2 + e)7(l —j/2)n 2 , the min-rank decoder 
recovers X w.h.p. Thus, Proposition [T9l says that if 7 is small, 
roughly twice as many measurements are needed for strong 
recovery vis-a-vis weak recovery. These fundamental limits 
(and the increase in a factor of 2 for strong recovery) are 
exactly analogous those developed by Draper and Malekpour 
in ||29l in the context of compressed sensing over finite fields 
and Eldar et al. |27| for the problem of rank minimization over 
the reals. Given our derivations in the preceding subsections, 
the proof of this result is straightforward. 

Proof: We showed in Proposition Q3] that with probability 
approaching one (exponentially fast), the relative minimum 
distance of ^ is no smaller than 1 — \/R — e for any e > 0. 
As such to guarantee strong recovery, we need the decoding 
regions (associated to each codeword in ff) to be disjoint. 
In other words, the rank distance between any two distinct 
codewords Ci,C2 £ must exceed 2r. See Fig. [4] for an 
illustration. In terms of the relative minimum rank distance 
1 — t/R — i, this requirement translates tc0 

l-VR-s>2j. (61) 

Rearranging this inequality as and using the definition of R 
[limit of R n in ( TSQb l as we did in Proposition Q3] yields the 
required number of measurements prescribed. ■ 
In analogy to Proposition [19] we can show the following 
for the sparse model. 

Proposition 20 (Strong recovery for sparse measurement 
model). Fix e > 0. Under the 5 = Q,( l -^^)-sparse measure- 
ment model, the min-rank decoder recovers all matrices of 
rank less than or equal to r with probability approaching one 
as n ^ 00 if ( 1601 l holds. 

Proof: The proof uses Proposition [18] and follows along 
the exact same lines as that of Proposition [19] ■ 

l4 The strong recovery requirement in ( 1611 is analogous to the well-known 
fact that in the binary Hamming case, in order to correct any vector r = c+e 
corrupted with t errors (i.e., ||e||n = t) using minimum distance decoding, 
we must use a code with minimum distance at least It + 1. 




Fig. 4. For strong recovery, the decoding regions associated to each codeword 
C S V have to be disjoint, resulting in the criterion in J6U . 



VIII. Reduction in the Complexity of the 
Min-Rank Decoder 

In this section, we devise a procedure to reduce the com- 
plexity for min-rank decoding (vis-a-vis exhaustive search). 
This procedure is inspired by techniques in the cryptography 
literature [43 1, [44|. We adapt the techniques for our problem 
which is somewhat different. As we mentioned in Section fVTIl 
the codewords in this paper are matrices rather than vectors 
whose elements belong to an extension field ||431 . 11441 . 

Recall that in min-rank decoding ([L2l , we search for a 
matrix X £ ^Nxn Q f mm i mum lan ]^ that satisfies the lin- 
ear constraints. In this section, for clarity of exposition, we 
differentiate between the number of rows (N) and the number 
of columns (n) in X. The vector y fc is known as the syndrome. 

We first suppose that the minimum rank in ([L2t is known to 
be equal to some integer r < min{A, n}. Since our proposed 
algorithm requires exponentially many elementary operations 
(addition and multiplication) in ¥ q , this assumption does not 
affect the time complexity significantly. Then the problem 
in ( [T2l reduces to a satisfiability problem: Given an integer 
r, a collection of parity-check matrices H Q ,a £ [k] and a 
syndrome vector y k , find (if possible) a matrix X £ F^ xn 
of rank exactly equal to r that satisfies the linear constraints 
in (IT2b . Note that the constrains in ([L2l are equivalent to 
(vec(H a ),vec(X)> = y a ,a£ [k]. 

We first claim that we can, without loss of generality, 
assume that y k = , i.e, the constraints in Sl2\ read 

(H„,X)=0, ae[k]. (62) 

We justify this claim as follows: Consider the new syndrome- 
augmented vectors [vec(H a ); y a ] T £ ¥^ n+1 for every a £ [k]. 
Then, every solution vec(X') of the system of equations 

([vbc(H«); y ], vec(X')) - 0, a £ [k] (63) 

can be partitioned into two parts, vcc(X') = [vcc(Xi); X2] 
where vec(Xi) £ Ff" and x 2 £ ¥ q . Thus, every solution 
of d63l satisfies one of two conditions: 

• %2 = 0. In this case Xj is a solution to the linear 
equations in $1% . 

• i2 / 0- In this case Xi solves (H a ,Xi) = a^yV Thus, 
.t 2 _1 Xi solves dT2b . 

This is also known as coset decoding. Now, observe that since 
it is known that X has rank equal to r (which is assumed 
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known), it can be written as 

T 



(64) 



1=1 



where each of the vectors u; £ ¥ q and v/ £ F™. The matrices 
U e Fj 1 " and V € F™ xr are of (full) rank r and are referred 
to as the basis matrix and the coefficient matrix respectively. 
The linear system of equations in d62l can be expanded as 

r N n 

Y j ui ' 1 vi 'i = °' ae w (65) 
i=i i=i j=i 



discussion on the indeterminacies in the decomposition of the 
low rank matrix X, we observe that the complexity involved 
in the enumeration of all F^ xr matrices in step 2 in the 
naive implementation can be reduced by only enumerating the 
different equivalence classes induced by ~. More precisely, 
we find (if possible) coefficients V for a basis U from each 
equivalence class, e.g., Ui £ [Ui],...,U m £ [U m ]. Note that 
the number of equivalence classes (by Lagrange's theorem) is 



9 



Ni 



< 4q 



r(N-r) 



(66) 



the fact that <£> g (r, r) > jq r , a simple consequence of 
Cor. 4]. Algorithmic ally, we can enumerate the equivalence 
classes by first considering all matrices of the form 



U 



Irxr 

Q 



(67) 



$ g (r,r) 

where recall from Section III-EI that $ q (r, r) is the number 
where u ; = [uj.i, . . . , ui, N ] T and v ; = [v hu v hn ] T . Thus, of non-singular matrices in FJ xr . The inequality arises from 
we need to solve a system of quadratic equations in the basis 
elements ui^ and the coefficients vij. 

A. Naive Implementation 

A naive way to find a consistent U and V for d65l l is to 
employ the following algorithm: 

1) Start with r = 1. 

2) Enumerate all bases U = {ui : i : i £ [N], I £ [r]}. 

3) For each basis, solve (if possible) the resulting linear 
system of equations in V = {vij : j £ [n], I £ [r]}. 

4) If a consistent set of coefficients V exists (i.e., do3T l is 
satisfied), terminate and set X = UV T . Else increment 
r 4— r + 1 and go to step 2. 

The second step can be solved easily if the number of 
equations is less than or equal to the number of unknowns, 
i.e., if nr > k. However, this is usually not the case since for 
successful recovery, k has to satisfy ( fT31 > so, in general, there 
are more equations (linear constraints) than unknowns. We 
attempt to solve for (if possible) a consistent V, otherwise we 
increment the guessed rank r. The computational complexity 
of this naive approach (assuming r is known and so no 
iterations over r are needed) is 0((nr) 3 q Nr ) since there are 
q Nr distinct bases and solving the linear system via Gaussian 
elimination requires at most 0((nr) 3 ) operations in ¥ q . 

B. Simple Observations to Reduce the Search for the Basis U 

We now use ideas from |43l , |24| and make two simple 
observations to dramatically reduce the search for the basis in 
step 2 of the above naive implementation. 

Observation (A): Note that if X solves (l62l . so does pX 
for any p £ ¥ q . Hence, without loss of generality, we may 
assume that the we can scale the (1,1) element of U to be 
equal to 1 . The number of bases we need to enumerate may 
thus be reduced by a factor of q. 

Observation (B): Note that the decomposition X = UV T 
is not unique. Indeed if X = UV T , we may also decompose 
X as X = UV T , where U = UT and V = VT/- T and 
T is any invertible r x r matrix over F 5 . We say that two 
bases U, U are equivalent, denoted U ~ U, if there exists 
an invertible matrix T such that U = UT. The equivalence 
relation ~ induces a partition of the set of F^ x r matrices. 

Let [U] := {U £ ¥ q Vxr : U - U} be the equivalence 
class of matrices containing the matrix U. From the preceding 



where I rxr is the identity matrix of size r, and Q takes on 
all possible values in ¥ q N r ) xr . Note that if Q and Q are 
distinct, the corresponding U = [I; Q T ] T and U = [I; Q T ] T 
belong to different equivalence classes. However, the top r 
rows of U may not be linearly independent so we have yet 
to consider all equivalence classes. Hence, we subsequently 
permute the rows of each previously considered U to ensure 
every equivalence class is considered. 

From the considerations in (A) and (B), the computa- 
tional complexity can be reduced from 0((nr) 3 q Nr ) to 
0((nr) 3 q r ( N ~ r )~ 1 ). By further noting that there is symme- 
try between the basis matrix U and the coefficient matrix 
V, we see that the resulting computational complexity is 
0((mnx{n,N}r) 3 q r( - 1Jlini - n ' N ^- r) - 1 ). Finally, to incorporate 
the fact that r is unknown, we start the procedure assuming 
r = 1, proceed to r r + 1 if there does not exist 
a consistent solution and so on, until a consistent (U, V) 
pair is found. The resulting computational complexity is thus 
0(r(max{n, j\T}r)V (min{n ' JV}_r)_1 )- 

IX. Discussion and Conclusion 

In this section, we elaborate on connections of our work to 
the related works mentioned the introduction and in Tables [I] 
and HJ We will also conclude the paper by summarizing our 
main contributions and suggesting avenues for future research. 

A. Comparison to existing coding-theoretic techniques for 
rank minimization over finite fields 

In general, solving the min-rank decoding problem (|4H is 
intractable (NP-hard). However, it is known that if the linear 
operator H (in (01 characterizing the code c €) admits a favor- 
able algebraic structure, then one can estimate a sufficiently 
low-rank (vector with elements in the extension field ¥ qn or 
matrix with elements in ¥ q ) x and thus the codeword c from 
the received word r efficiently (i.e., in polynomial time). For 
instance, the class of Gabidulin codes Q, HI, which are rank- 
metric analogs of Reed-Solomon codes, not only achieves 
the Singleton bound and thus has maximum rank distance 
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Fig. 5. Probabilistic crisscross error patterns 1171 : The figure shows an error 
matrix X. The non-zero values (indicated as black dots) are restricted to two 
columns and one row. Thus, the rank of the error matrix X is at most three. 

(MRD), but decoding can be achieved using a modified form 
of the Berlekamp-Massey algorithm (See [45 1 for example). 
However, the algebraic structure of the codes (and in particular 
the mutual dependence between the equivalent H a matrices) 
does not permit the line of analysis we adopted. Thus it is 
unclear how many linear measurements would be required in 
order to guarantee recovery using the suggested code structure. 
Silva, Kschischang and Kotter 1 10] extended the Berlekamp- 
Massey-based algorithm to handle errors and erasures for the 
purpose of error control in linear random network coding. In 
both these cases, the underlying error matrix is assumed to be 
deterministic and the algebraic structure on the parity check 
matrix permitted efficient decoding based on error locators. 

In another related work, Montanari and Urbanke iTTTI as- 
sumed that the error matrix X is drawn uniformly at random 
from all matrices of known rank r. The authors then con- 
structed a sparse parity check code (based on a sparse factor 
graph). Using an "error-trapping" strategy by constraining 
codewords to have rows that are have zero Hamming weight 
without any loss of rate, they first learned the rowspace of 
X before adopting a (subspace) message passing strategy to 
complete the reconstruction. However, the dependence across 
rows of the parity check matrix (caused by lifting) violates 
the independence assumptions needed for our analyses to 
hold. The ideas in ifTTl were subsequently extended by Silva, 
Kschischang and Kotter 1 18] where the authors computed the 
information capacity of various (additive and/or multiplicative) 
matrix- valued channels over finite fields. They also devised 
"error-trapping" codes to achieve capacity. However, unlike 
this work, it is assumed in ifTSl that the underlying low-rank 
error matrix is chosen uniformly. As such, their guarantees do 
not apply to so-called crisscross error patterns [17], [45] (see 
Fig. |5j, which are of interest in data storage applications. 

Our work in this paper is focused primarily on under- 
standing the fundamental limits of rank-metric codes that are 
random. More precisely, the codes are characterized by either 
dense or sparse sensing (parity-check) matrices. This is in con- 
trast to the literature on rank-metric codes (except []9] Sec. 5]), 
in which deterministic constructions predominate. The codes 
presented in Section IVIII are random. However, in analogy to 
the random coding argument for channel coding [35. Sec. 7.7], 
if the ensemble of random codes has low average error 
probability, there exists a deterministic code that has low 
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error probability. In addition, the strong recovery results in 
Section IVII-CI allow us to conclude that our analyses apply 
to all low-rank matrices X in both equiprobable and sparse 
settings. This completes all remaining entries in Table [II] 

Yet another line of research on rank minimization over finite 
fields (in particular over F2) has been conducted by the com- 
binatorial optimization and graph theory communities. In [33 
Sec. 6] and ll46l Sec. 1] for example, it was demonstrated that 
if the code (or set of linear constraints) is characterized by 
a perfect graph0 then the rank minimization problem can be 
solved exactly and in polynomial time by the ellipsoid method 
(since the problem can be stated as a semidefinite program). In 
fact, the rank minimization problem is also intimately related 
to Lovasz's 9 function [47 . Theorem 4], which characterizes 
the Shannon capacity of a graph. 

B. Conclusion and Future Directions 

In this paper, we derive information-theoretic limits for 
recovering a low-rank matrix with elements over a finite field 
given noiseless or noisy linear measurements. We show that 
even if the random sensing (or parity-check) matrices are very 
sparse, decoding can be done with exactly the same number 
of measurements as when the sensing matrices are dense. We 
then adopt a coding-theoretic approach and derived minimum 
rank distance properties of sparse random rank-metric codes. 
These results provide geometric insights as to how and why 
decoding succeeds when sufficiently many measurements are 
available. The work herein could potentially lead to the design 
of low-complexity sparse codes for rank-metric channels. 

It is also of interest to analyze whether the sparsity factor 
of Q(-^^) is the smallest possible and whether there is 
a fundamental tradeoff between this sparsity factor and the 
number of measurements required for reliable recovery of the 
low-rank matrix. Additionally, in many of the applications 
that motivate this problem, the sensing matrices fixed by 
the application and will not be random; take for example 
deterministic parity-check matrices that might define a rank- 
metric code. In rank minimization in the real field there are 
properties about the sensing matrices, and about the underlying 
matrix being estimated, that can be checked (for example 
the restricted isometry property |]6] Eq. (1)], or random point 
sampling joint with incoherence of the low-rank matrix) that, 
if they are satisfied, guarantee that the true matrix of interest 
can be recovered using convex programming. It is of interest 
to identify an analog in the finite field, that is, a necessary 
(or sufficient) condition on the sensing matrices and the 
underlying matrix such that recovery is guaranteed. We would 
like to develop tractable algorithms along the lines of those in 
Table [J or in the work by Baron et al. ||261 to solve the min- 
rank optimization problem approximately for particular classes 
of sensing matrices such as the sparse random ensemble. 

Finally, Dimakis and Vontobel fl48ll make an intriguing 
connection between linear programming (LP) decoding for 
channel coding and LP decoding for compressed sensing. 
They reach known compressed sensing results via a new path 

l5 A perfect graph G is one in which each induced subgraph H C G has 
a chromatic number x(H) that is the same as its clique number to(H). 
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- channel coding. Analogously, we wonder whether known 
rank minimization results can be derived using rank-metric 
coding tools, thereby providing novel interpretations. And just 
as in [48], the reverse direction is also open. That is, whether 
the growing literature and understanding of rank minimization 
problems could be leveraged to design more tractable and 
interesting decoding approaches for rank-metric codes. 
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Appendix A 
Proof of Lemma|6] 

Proof: It suffices to show that the conditional probability 
W(Az>\Az) = P(-Az') = <T fc for Z ^ Z'. We define the 
non-zero matrices M := X Z and M' := X - Z'. Let 
JC := supp(M' - M) and C := supp(M). The idea of the 
proof is to partition the joint support JC U C into disjoint sets. 
More precisely, consider 

HA Z < \A Z ) = P«M', Hi) = 0| (M, Hi) = 0) fe 

= P((M'-M, Hi)=0| (M, Hi) = 0) fe , (68) 

where (a) is from the definition of Az '■= {(X — Z, H a ) = 
0,Va £ [k]} and the independence of the random matrices 
H a ,a€ [k] and (b) by linearity. It suffices to show that the 
probability in d68l is q~ l . Indeed, 

P((M' - M, Hi) = 1 (M, Hi) = 0) 

( = } p( J2 [M'-M] 4J [Hi] 8J =0 

(t.j)eic 

E l M kA"ikj = o) 
= p( E l"iki = v\ E [ H iki = °)> (69) 

(i,j)£>C (i,j)e£ 

where (c) is from the definition of the inner product and the 
sets JC and £, (d) from the fact that [M]jj [Hi]j.j has the 
same distribution as [Hijjj since [MJjj 7^ and [Hi],j is 
uniformly distributed in W q . Now, we split the sets JC and C 



in d69l l into two disjoint subsets each, obtaining 
P((M' - M, Hi) = I (M, Hi) = 0) 

=p( e [ H ^+ E [ h i]m =° 

E [»i]i*+ E [ H ik,=o) 

( = ) p( ]r ^kj= E t^M 

(i,j)ec\K (i,j)efc\£ 

E [Hiki = - E [Hik,)=^, 

(i,j)eC\K (i,j)ecmc 

Equality (e) is by using the condition j)ec\ic ,j = 
— S(« j)ecnK.i^t]i-j an< ^ finally (/) from the fact that the sets 
JC\C, C\JC and £n/C are mutually disjoint so the probability is 
q^ 1 by independence and uniformity of [Hi].y, (i, j) £ [n] 2 . ■ 

Appendix B 
Proof of Proposition[8] 

Proof: Recall the optimization problem for the noisy case 
in d32t where the optimization variables are X and w. Let 

^noisy c jnxn x pfc be the set Q f p t i m i ze rs. In analogy 

to dT3] ). we define the "noisy" error event 

8%* v :={\S oaSv \> l}U({|5 noisy | = l}n{(X*, w*) ^ (X, w)}). 

Note that (£'" olsy ) c occurs, both the matrix X and the noise 
vector w are recovered so, in fact, we are decoding two objects 
when we are only interested in X. Clearly, £ n C £'" olsy so it 
suffices to upper bound P(£ " olsy ) to obtain an upper bound 
of P(£„). For this purpose consider the event 

A»™ y := {(Z, H„> = (X, H B ) +v a ,Vae [k]}, (70) 

defined for each matrix- vector pair (Z,v) e F™ x ™ x such 
that rank(Z) + A||v||o < rank(X) + A||w||o- The error event 
£noisy occurs if anc j on jy jf there exists a pair (Z, v) ^ (X, w) 
such that (i) rank(Z) + A||v|| < rank(X) + A||w|| and (ii) 
the event A^"^ y occurs. By the union of events bound, the 
error probability can be bounded as: 

F(£ „o isy) < j2 p(^t sy ) 

(Z,v):rank(Z)+A||v||o<rank(X)+A||w||o 
(a) ^ q - h 

(Z,v):rank(Z)+A||v||o<rank(X)+A||w|| 

< q- k \Ur tS \, (71) 

where (a) is from the same argument as the noiseless case 
[See (TTBl l and in (b), we defined the set U TtS := {(Z,v) : 
rank(Z) + A||v||o < rank(X) + A||w||o}, where the subscripts 
r and s index respectively the upper bound on the rank of X 
and sparsity of w. Note that s = ||w||o = [(rn 2 \ < an 2 . It 
remains to bound the cardinality of lA TtS . In the following, we 
partition the counting argument into disjoint subsets by fixing 
the sparsity of the vector v to be equal to I for all possible 
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Vs. Note that < I < (||v|| ) r 
U ryS is bounded as follows: 



j + s. The cardinality of 



\U r 



(IMIo)za 

E 

1=0 



(.) CMS*. 
^ E 

i=0 



|{veF^:||v|| = 0|x 

nx ™ • rank(Z) <r + A(s-Z)}| 



x |{Z e F, 



(9-1) 



4g 



2n[r+A(s-0]-[r+A(«-0] 



(b) 
< 



(c) 
< 



+ .S + 



1) 2 kH ^~ 



qT+ s 4 : q 2n ( r + Xs )-( r + Xs ) 2 



) r , J x+ s 4 : q 2n <- r + Xs )-( r + Xs f 



where (a) follows by bounding the number of vectors which 
are non-zero in I positions and the number of matrices 
whose rank is no greater than r + A(s — I) (Lemma [TJ, (6) 
follows by first noting that the assignment r H> 2nr — r 2 
is monotonically increasing in r = 0,1,..., n and second 
by upper bounding the summands by their largest possible 
values. Observe that (|33T > ensures that j + s < 4, which is 
needed to upper bound the binomial coefficient since I m- ( ,) 
is monotonically increasing iff / < k/2. Inequality (c) uses 
the fact that the binomial coefficient is upper bounded by a 
function of the binary entropy [35 , Theorem 11.1.3]. Now, 
note that since r/n — > 7, for every rj > 0, \r /n — j\ < rj for 
n sufficiently large. Define j n := 7 + rj + a. From (c) above, 
\U r . s \ can be further upper bounded as 



(d) 



< 4 (j v n 2 + l) 2 kH2( ^ ) q^ n q 2 ^ n 



(e) 



< 0{n 2 )2 kH2( ^ ) q^ n " +2 ^ n2 



(72) 
(73) 



Inequality (d) follows from the problem assumption that 
rank(X) < r < (7 + 77)72 for n sufficiently large, ||w||o = s < 
an 2 and the choice of the regularization parameter A = 1/n. 
Inequality (e) follows from the fact that since k satisfies d33l . 
k > 3%(1 — 7 r) /3)n 2 and hence the binary entropy term 
in d72l i can be upper bounded as in ( f73l . By combining (l7TT i 
and (l73l . we observe that the error probability P(£ I " olsy ) can 
be upper bounded as 

P(£n° isy ) < O(n 2 )< ? _ " 2 [^ (1_(los ' l2)ff2( ^ ) " 3 ^ + ^] . (74) 

Now, again by using the assumption that k satisfies d33l . the 
exponent in ( 1741 is positive for 77 sufficiently small (7,7 — > 7+cr 
as ?/ -> 0) and hence P(£™ is v) -^Oasn^oo. ■ 



Appendix C 
Proof of Corollary[9] 

Proof: Fano's inequality can be applied to obtain inequal- 
ity (a) as in dTOb . We lower bound the term _ff(X|y fc ,H fc ) 
in ( [Tol l differently taking into account the stochastic noise. It 
can be expressed as 

H(X\y k ,H k ) = H(X) - H(y k \H k ) + H(y k \H k ,X). (75) 



The second term can be upper bounded as H(y k \H k ) < k 
by (fTTb . The third term, which is zero in the noiseless case, 
can be (more tightly) lower bounded as follows: 



(a) 



(b) 



J?(y fc |H fc ,X) = /ii?(y 1 |H 1 ,X)^fci?(w 1 ) > kH q (p), (76) 

where (a) follows by the independence of (X, Hi) and wi and 
(b) follows from the fact that the entropy of w with pmf in ( l34l ) 
is lower bounded by putting all the remaining probability mass 
p on a single symbol in ¥ q \ {0} (i.e., a Bern(p) distribution). 
Note that logarithms are to the base q. The result in 
follows by uniting d75l >. d76l > and the lower bound in 0. 



Appendix D 
Proof of Corollary [Tol 

Proof: The main idea in the proof is to reduce the problem 
to the deterministic case and apply Proposition [8] For this 
purpose, we define the (-typical set (for the length-fc = [an 2 ] 
noise vector w) as 



T c =T c (w) := weFj 



l|w||o 
an 2 



P 



We choose ( to be dependent on n in the following way (cf. 
the Delta-convention ll49ll ): £ n — > and n( n —> 00 (e.g., ( n = 
n^ 1 / 2 ). By Chebyshev's inequality, P(w ^ 7^ n ) — > as n — > 
00. We now bound the probability of error that the estimated 
matrix is not the same as the true one by using the law of 
total probability to condition the error event £" olsy on the 
event {w 6 7^„} and its complement: 

P(£"° isy ) < P(£"° isy |w G T c „ ) + P(w i 7c„). (77) 

Since the second term in d77| i converges to zero, it suffices 
to prove that the first term also converges to zero. For this 
purpose, we can follow the steps of the proof in Proposition [8] 
and in particular the steps leading to (l72l i and (l74t . Doing so 
and defining pq := p + £, we arrive at the upper bound 

P(^ oisy |we T C J 



<0(n 2 )2 kH2( '- 



2n 2 (7+p^ Ti a) — (■yn-\-p t ^ n an) 2 —an 2 



x q 

<0(71 2 )q _ " 2 [ ct - a ( lo S7 2 )- ff 2(p C „+5)-2ap i: „(l-7)+a 2 pL- 2 ^+^ 2 . 

= 0(77- 2 )q~" 2 [ 9 ' a;Pc " ,7 ^ _27 ' 1_7 / 2 - ) ], (78) 

Since £„ — >• and g defined in d36] l is continuous in the second 
argument, g{pi;p^ n , r )) — > g(a;p,j). Thus, if a satisfies d37| i, 
the exponent in (|78] i is positive. Hence, P(£" olsy ) — > as 
n — > 00 as desired. ■ 



Appendix E 
Proof of Theorem[TT1 

Proof: We first state a lemma which will be proven as 
the end of this section. 
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Lemma 21. Define d :— ||X — Z||o The probability of 
Ax, defined in 1 161 , under the 8-sparse measurement model, 
denoted as 9(d; 8, q, k), is a function of d and is given as 

k 



0(d; S,q,k) 



1 



(79) 



Lemmal2Tlsavs that the probability P(Az) is only a function 
of X though the number of entries it differs from Z, namely 
d. Furthermore, it is easy to check that the probability in (|79l > 
satisfies the following two properties: 

1) 9(d; S, q, k) < (1 - 6) k < exp(-kS) for all d e [n% 

2) 9(d; 5, q, k) is a monotonically decreasing function in d. 
We upper bound the probability in dl7) . To do so, we partition 
all possibly misleading matrices Z into subsets based on their 
Hamming distance from X. Our idea is to separately bound 
those partitions with low Hamming distance (which are few 
and so for which a loose upper bound on 9(d; S, q, k) suffices) 
and those further from X (which are many, but for which we 
can get a tight upper bound on 9(d; 6, q, k), a bound that is 
only a function of the Hamming distance \/3n 2 ~\). Then we 
optimize the split over the free parameter /3: 



P(£n) < 



E 



= 1 Z:Z#X,rank(Z)<rank(X) 

||X-Z|| =d 



(a) 



E 



E 



9(d;S,q,k)- 



= 1 Z:Z#X,rank(Z)<rank(X) 
||X-Z|| =d 



E 



E 



6(d;S,q,k) 



< 



d=[f3n 2 ] Z:Z^X.rank(Z)<rank(X) 
||X-Z|| =d 

L/3« 2 j _^ 

E E exp(~fc<5)- 

d=l Z:Z^X,rank(Z)<rank(X) 
l|X-Z||n=d 



+ E E 9(\f3n 2 l,5,q,k) 

d=\/3n 2 ] Z:Z^X,rank(Z)<rank(X) 
||X-Z|| =d 

< |{Z : ||Z-X|| < [/3n 2 \}exp(~k6) + 
+ n 2 \{Z : rank(Z) < rank(X)}|6<( \/3n 2 ] ; 5, q, k). (80) 

In (a), we used the definition of 9(d; 8, q, k) in Lemmal2Tl The 
fractional parameter f3, which we choose later, may depend 
on n. In (6), we used the fact that 9(d; S, q, k) < exp(-kS) 
and that 9(d; S, q, k) is monotonically decreasing in d so 
9{d;S,q,k) < 9{\Pn 2n \\8,q,k) for all d > \Pn 2 A l . In (c), 
we upper bounded the cardinality of the set {Z ^ X : 
rank(Z) < rank(X), ||X - Z|| < l/3n 2 \} by the cardinality 
of the set of matrices that differ from X in no more than 
[/3n 2 \ locations (neglecting the rank constraint). For the 
second term, we upper bounded the cardinality of each set 
M d := {Z ^ X : rank(Z) < rank(X), ||X - Z|| = d} 
by the cardinality of the set of matrices whose rank no more 
than rank(X) (neglecting the Hamming weight constraint). 



We denote the first and second terms in (1801 as A n and B n 
respectively. Now, 

A n := |{Z:||Z-X|| < [/3n 2 \}\ exp(-faJ) 



< ) 2" 2 ^(«( 9 -l)^ 2 exp(-fc ( 5) 

< 2 n 2 [ff 2 ( / 9)+/31og 2 (9-l)--^,51og 2 (e)] 



(81) 



where (a) used the fact that the number of matrices that differ 
from X by less than or equal to [f3n 2 \ positions is upper 
bounded by 2 n2 H ^ (q - l)* 9 ™ 2 . Note that this upper bound 
is independent of X. Now fix rj > and consider B n : 

B n := n 2 \{Z : rank(Z) < rank(X)}\9(\f3n 2 y,5,q,k) 
< An 2 q^ l -^+^ n2 9(\0n 2 ] ; 5, q, k) 

(Ji) 4n 2 g « 2 [27(l-7/2)+r,+ ^!og ? (g- 1 +(l-g- 1 )(l- T - 



(82) 

In (a), we used the fact that the number of matrices of rank 
no greater than r is bounded above by 4<j , ( 27 ( 1-7 / 2 ' + '') n 
(Lemma [TJ for n sufficiently large (depending on r\ by 
the convergence of r/n to 7). Equality (b) is obtained by 
applying ( 1791 in Lemma |2T1 

Our objective in the rest of the proof is to find sufficient 
conditions on k and f3 so that dSTT ) and d82l both converge to 
zero. We start with B n . From d82l we observe that if for every 
e > 0, there exists an Ni e € N such that 



k > 



1 + l 



27(l-7/2)n 2 



loe 



(1 



1 



for all n > Ni e , then B n —> since the exponent in 
is negative (for r\ sufficiently small). Now, we claim that if 
linin^oo \/3n 2 ~\5 = +00 then the denominator in (l83l l tends to 
1 from below. This is justified as follows: Consider the term, 



1-T 



[/3n 2 l 



< exp 



\f3n 2 ]5 



so the argument of the logarithm in (|83l l tends to q from 



above if lim r . 



,\f3n 2 ~\5 = +00. 



Since <5 £ by definition, there exists a constant 

C E (0, 00) and an integer N$ E N such that 



S = S n > C 



log 2 (n) 



for all n> Ng. Let (3 be defined as 

2 7 (1- 7/2) log 2 (e)<5 



/3 = A. 



log 2 (n) 



(84) 



(85) 



Then \/3n 2 ]S > 2 7 (1 - 7/2) log 2 (e)C 2 log 2 (n) = 9(logn) 
and so the condition lirrin^oo \(3n 2 ~\ 5 = +00 is satisfied. Thus, 
for sufficiently large n, the denominator in d83l l exceeds 1/(1 + 
e/5) < 1. As such, the condition in ( T83b can be equivalently 
written as: Given the choice of j3 in 1851 . if there exists an 
N 2 , e G N such that 



k > 2 1 



(l + |) 7(l~7/2)n 2 



(86) 
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for all n > i\T 2l e, then B n -> 0. 

We now revisit the upper bound on A n in (T8TT >. The 
inequality says that, for every e > 0, if there exists an 
N 3e £ N such that 



A . g2(/3)+^log 2 (9-l) n2 
* 5/ <51og 2 (e) 



(87) 



for all n > N 3 j£ , then j4„ — > since the exponent in (f8TT > is 
negative. Note that H 2 (P)/(-P\og 2 P) I 1 as /3 4 0. Hence, 
if /3 is chosen as in d85l l. then by using ([84-b . we obtain 



(88) 



ff 2 (/3)+/31og 2 (<z-l) 



n-+oo 51og 2 (e) 



<2 7 (l- 7 /2). 



In particular, for n sufficiently large, the terms in the sequence 
in d88l and its limit (which exists) differ by less than 2 7 (1 — 
7/2)e/5. Hence (f87T > is equivalent to the following: Given the 
choice of P in ([85), if there exists an /V4 e £ N such that 



fc> 2(l + |) 2 7 (l- 7 /2)n 2 



(89) 



for all n > N^ e , the sequence A„ — > 0. The choice of P 
in ( f85T > "balances" the two sums A n and B n in (fSOb . Also 
note that 2(1 + e/5) 2 < 2 + e for all e G (0, 5/2). 

Hence, if the number of measurements fc satisfies (fT~5T > for 
all n > N e , s := max{N hs , N 2 , s , N 3 , s , iV 4 , e , N s }, both §6} 
and (l89l will also be satisfied and consequently, P(£ rl ) < A n + 
B n — »• as n —¥ 00 as desired. We remark that the restriction 
of e £ (0,5/2) is not a serious one, since the validity of the 
claim in Theorem QT| for some £0 > implies the same for 
all £ > £0. This completes the proof of Theorem QT] ■ 

It now remains to prove Lemma |2T1 

Proof: Recall that d = ||X - Z|| and 6(d;S,q,k) = 
P((H ,X) = (H„,Z),a € [fc]). By the i.i.d. nature of the 
random matrices H a ,a £ [k], it is true that 

8(d; S, q, k) = P((H 1 ,X) = (H 1; Z)) fe . 

It thus remains to demonstrate that 



P((H 1; X) = (H x , Z)) = q- 1 + (1 - q- 1 ) 1 - 



i-<r 



(90) 

This may be proved using induction on d but we prove it using 
more direct transform-domain ideas. Note that (190} is simply 
the d-fold g-point circular convolution of the <5-sparse pmf 
in @9). Let F e C qxq and F _1 e C qxq be the discrete Fourier 
transform (DFT) and the inverse DFT matrices respectively. 
We use the convention in [50|. Let 



P := Ph(-;S,q) = 



l-S 
8/(9 1). 



be the vector of probabilities defined in (l39"V Then, by 
properties of the DFT, (|90t is simply given by F _1 [(Fp) d ] 
evaluated at the vector's first element. (The notation v d := 
[vq ... X ] T denotes the vector in which each component 



of the vector v is raised to the rf-th power.) We split p into 
two vectors whose DFTs can be evaluated in closed-form: 



\8/(q~ 


1)" 




'l-d-d/(q-l 


8/(9- 


1) 


+ 





8/(9- 


1). 








Let the first and second vectors above be pi and p 2 respec- 
tively. Then, by linearity of the DFT, Fp = Fpi +Fp 2 where 





-q5/(q - 1)" 




"1 -5 


-8/(q- 


1) 









1-5 


-8/(q- 


1) 


Fpi = 




, Fp 2 = 















1-5 


-8/(q- 


1) 



Summing these up yields 



Fp 



l-5/(l-q-l) 



1-5/il-q- 1 ) 
Raising Fp to the d-th power yields 



(Fp)- 



(l-6/(l-q-i)) d 
(l-<V(l-<T 1 )) rf 



Now using the same splitting technique, (Fp) d can be de- 
composed into 



(Fp) 



.d 



Xl-6/(l-q-l)f 
(l-5/(l-q-l)) d 



(l-S/il-q- 1 ))* 



l-(l-*/(l-g-l))<T 









Let Si and s 2 denote each vector on the right hand side above. 
Define ip := (1 — 5/(1 — q^ 1 )) d . Then, the inverse DFTs of 
Si and s 2 can be evaluated analytically as 



F- 1 s 1 = 







F" 1 



s 2 = 



9~ 1 (l-V>) 
q-Hl-v). 



Summing the first elements of F 1 Si and F x s 2 completes 
the proof of d90l > and hence of Lemma |2l] ■ 



Appendix F 
Proof of Lemma[T21 

Proof: The only matrix for which the rank r = is the 
zero matrix which is in ^, since c € is a linear code (i.e., a 
subspace). Hence, the sum in d43l l consists only of a single 
term, which is one. Now for 1 < r < n, we start from (l43l l 
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and by the linearity of expectation, we have 

EN»(r)= EI{Me^} 

M£FJ x " :rank(M)=r 

^ P(M £ <*f) 

M£Fj x ":rank(M)=r 

( => £ r* = *,(»,r)?- fc , 

MeFJ xn :rank(M)=r 

where (a) is because M 7^ (since 1 < r < n). Hence, 
as in C[8]i, P(M G If) = q~ k . The proof is completed by 
appealing to ©, which provides upper and lower bounds on 
the number of matrices of rank exactly r. For the variance, note 
that the random variables in the set {I{M £ ^} : rank(M) = 
r} are pairwise independent (See Lemma |6). As a result, the 
variance of the sum in (l43l l is a sum of variances, i.e., 

var(N^(r)) = £ var(I{M £ If}) 

MGF£ x ":rank(M)=r 

£ E [I{M £ tf} 2 ] - [EI{M e f ^}] 2 

MeF™ x ":rank(M)=r 

< £ EI{Me^}=EN ? (r), 

MeF" x ":rank(M)=r 

as desired. ■ 



Appendix G 
Proof of Proposition[T41 

Proof: We first restate a beautiful result from 
each positive integer fc, define the interval :— [-^ 



2\. For 

fc £-ll 



Theorem 22 (Corollary 2.4 in |@2)). Lef M be a random kxk 
matrix over the finite field ¥ q , where each element is drawn 
independently from the pmf in J39b with S, a sequence in fc, 
belonging to Ik for each k £ N. Then, for every I < k, 



P(fc - rank(M) > I) < Aq~ 



(91) 



and A is a constant. Moreover, if A is considered as a function 
of S then it is monotonically decreasing as a function in the 
interval Xj~- 

To prove the Proposition fl4l first define N := n 2 and let 
h a := vec(H a ) £ be the vectorized versions of the random 
sensing matrices. Also let H := [hi . . . hk] £ F^ xfe be the 
matrix with columns h a . Finally, let H[ fexfe ] £ F* xfc be the 
square sub-matrix of H consisting only of its top k rows. 
Clearly, the dimension of the column span of H, denoted as 
m > rank(H[fe X fe]). Note that m is a sequence of random 
variables and fc is a sequence of integers but we suppress their 
dependences on n. Fix < e < 1 and consider 



(li-'H- 



m 

fc^- £ 



< 



rank(H 



[kxk], 



< 1-e 



(fc - rank(H [fexfe] ) > ek) 



where for (a) recall that fc £ 6(n 2 ) and 5 £ 
These facts imply that 6 (as a sequence in n) belongs to the 
interval 2^ for all sufficiently large n [because any function 
in dominates the lower bound for fc £ 6(n 2 )] 

so the hypothesis of Theorem [22] is satisfied and we can 
apply (|9TT i (with I — ek) to get inequality (a). Since (|92l 
is a summable sequence, by the Borel-Cantelli lemma, the 
sequence of random variables m/fc — > 1 a.s. ■ 



(a) 

< Aq~ 
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